Analyzing Data: Measures of Central Tendency, Plots, and Outliers

Verified

Added on  2019/09/23

|3
|1876
|303
Homework Assignment
AI Summary
This document presents a comprehensive solution to a statistics assignment, delving into various measures of central tendency, including arithmetic mean, median, and mode, and their application in different scenarios. It demonstrates the use of stem plots and box plots for data visualization, and explains how to calculate and interpret these plots. The assignment covers frequency tables, the identification of outliers, and the calculation of the interquartile range. Furthermore, it illustrates the practical application of these statistical concepts through several example problems, such as calculating a student's required score on a final exam to achieve a desired average, analyzing the impact of removing an outlier from a dataset, and calculating the weighted mean of car sales. The solutions provide detailed step-by-step calculations and explanations, making it a valuable resource for students studying statistics.
Document Page
Q1.
Measures of Central Tendency
The measures of central tendency is a parameter used to specify the point where the data
are centered. This method of measures of central tendency is widely used than any other
statistical measure because they are computed easily and can be applied easily. Hence a
set of data is described in this by identifying the central position of data.
The common measures of central tendency are like arithmetic mean, the median, the
mode, the weighted mean, and the geometric mean. Such measures of central tendency
are all applied under some conditions which are given below.
1) Arithmetic Mean: This computes the sum of all the observations divided by the number
of observations. This is used when there is a need to measure a middle or centre of data
and when the type of variable is interval/ratio (not skewed).
2) The median: This computes the middle value in a given set of items that are sorted in
ascending or descending order. Such a measure of central tendency is used when a
relative position of the given observations is to be computed. It is generally preferred
when the data is skewed and ordinal.
3) The mode: In a given data set the value which occurs most frequently is known as the
mode. This type of measure of central tendency is generally used with nominal data.
Q2.
Stem Plot is a table which is special since the first digits of the number are split into a
stem and last digits are structured into a leaf. This is a method of representing the
frequency and presenting the quantitative data in a graphical format.
Now in this case first rearrange the data from smallest data to largest data. So it becomes
as 55 56 59 62 63 64 64 64 69 77 77 85
Now take the first digit of the smallest number 55 and so on
Hence the stem plot for given data set
Stem Leaves
5 5 6 9
6 2 3 4 4 4 9
7 7 7
8 5
Mean of the given data is
(55 + 77 + 64 + 77 + 69 + 63 + 62 + 64 + 85 + 64 + 56 + 59)/12
66.25
Q3.
(a)
Mean = sum of all observations / total no. of observations
(27 + 30 + 21 + 62 + 28 + 18 + 23 + 22 + 26 + 28) / 10
28.5
hence mean = 28.5
As number of observations are even in number thus the median is the mean of the values
of observations occupying n/2 and (n+2)/2
Hence n/2 = 10/2 = 5th position
(n+2)/2 = 12/2 = 6th position
first sort the observations in ascending order which comes as
18 21 22 23 26 27 28 30 62
thus at 5th position value is 26
at 6th position value is 27
mean of the numbers at 5th and 6th position is = (26 + 27)/2
26.5
hence median = 26.5
(b)
In the above scenario mean = 28.5 and median = 26.5, therefore median is the best
measure of central tendency as the distribution above is skewed and have the small
number of observations.
(c)
Outliers is defined as the point of observation that is too far or at a distant from other
observations. Thus, the outlier in above given set of observations is the value 62.
Q4.
Four test scores are 74%, 68%, 84%, and 79%
We have to find his fifth test score in exam and his average marks given are 75%
Let x be his fifth test score thus
(74 + 68 + 84 + 79 + x)/5 = 75
305 + x = 375
x = 375 – 305
x = 70
Hence, the minimum score he needs on the final exam to pass the class with 75%
average is equal to 70%
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
(a)
Frequency of any value is the number of times it is occurring in the given set of data
hence frequency table is calculated as below using the data as
1 2 2 2 3 3 3 3 4 5 5 5 6 6 6 6 6 7 7 9 9 10 10
No. of books read Frequency
1-2 4
3-4 5
5-6 8
7-8 2
9-10 4
(b)
Mean of raw data = (1*4 + 2*4 + 3*5 + 4*5 + 5*8 + 6*8 + 7*2 + 8*2 + 9*4 + 10*4)/
(4+5+8+2+4)
241/23
10.47
hence mean = 10.47
(c)
For calculating median we have total number of observations as 23 which is an odd
number therefore it comes n/2
23/2
11.5
median = 11.5
Q6.
(a)
Mean = (2 + 5 + 3 +10 + 0 + 4 + 2)/ 7
26/7
3.71
for median arrange the data in ascending order as
0 2 2 3 4 5 10
Hence, median = (n +1)/2
(7 + 1)/2
4
Mode is the most frequently occurring value in the given set of data hence mode is the
value 2 as it is the value which occurs most frequently.
Thus, Mean = 3.71, Median = 4 and Mode = 2
(b)
The best measure of central tendency is the mode as it has the least value as 2 here and
the data here is nominal.
(c)
If Wednesday is removed then value 10 is removed form the data set hence, all the three
measures of central tendency have an impact on them which are described below
Mean = (2 + 5 + 3 + 0 + 4 + 2)/6
16/6
2.6
For calculating median we have n/2 and (n+2)/2 positions
So in sorted order list of data becomes as
0 2 2 3 4 5
n/2 = 6/2 = 3
(n+2)/2 = 8/2= 4
thus mean of the values at 3rd and 4th positions are = (2 +3)/2 = 2.5
median is 2.5
Mode remains same as 2 is the only value hose frequency is more than any other value in
data set.
(d)
The outlier in above data set is the value 10 as it at a distant from any other values in the
data set.
Hence, if the outlier is removed then mean is reduced to 2.6 from 3.71 and median is
reduced to 2.5 from 4. So both mean and median decrease to some extent which is good
for a data set. But, mode remains same as 2 sine the frequency of value does not depends
on the value of outlier.
Q7.
(a)
First calculate the sum of all purchase price * number of cars sold which is equal to
= (15000 * 3) + (20000 * 4) + ( 23000 *5) + (25000 *2) + (45000 *1)
= 45000 + 800000 + 115000 + 50000 + 45000
= 335000
now mean = 335000/ total no of cars sold
335000/(3 +4 + 5+ 2 +1)
335000/15
22333.3333
Now median is (n+1)/2
(5+1)/2
Document Page
3
hence median is the value at 3rd position that is 23000
(b)
The best measure of central tendency is the value of mean that is 22333 which is lower
than the median value which is 23000
(c)
The outlier in this given set of data is the value 45000 as it at a far distance from other
values given and if it is removed it will not give any affect as the frequency given with is
1 which is very low in comparison to others and will not make any difference on the
values of mean and median.
Q8.
Box Plot is the way to represent the statistical data on a plot where a rectangle is drawn
for presenting second and third quartiles usually with a vertical line inside to indicate the
median value. Either side of the rectangle the lower and upper quartiles that are the
minimum and maximum values are present.
Hence, here in the given box plot
A letter represents minimum value as lower quartile
B letter represents first quartile
C is the median value
D third quartile
E is the maximum value
Q9.
(a)
First step in plotting the Box Plot is to calculate the median so sort the list as
55 64 73 79 81 85 86 87 90 93
as number of observations is 10 that is even
therefore median is at n/2 and (n+2)/2 positions
n/2 = 10/2 = 5th position
(n +2)/2 =12/2= 6th position
mean at 5th and 6th position
(81 + 85 )/2
83
thus median is 83
now make two half of the list by taking 83 as the median
First half list is as 55 64 73 79 81 whose median is 73
Second half list is as 85 86 87 90 93 whose median is 87
Thus Box plot is like
Where Lower quartile (A) is 55 which is the minimum value
Upper quartile (E) is 93 which is the maximum value
Median (C) is 83
B the second quartile is 73
D third quartile is 87
Box Plot is like
55-------73-----------83----------87-----------93
(b)
The 55 is the value of Letter A in box plot
The value 73 is Letter B in Box plot
The value 83 is Letter C in Box Plot
The value 87 is Letter D in Box plot
The value 93 is Letter E in Box plot
Q10.
(a)
The median is the value calculated as C = 83 form the given data set as the number of
observations given are even in number.
(b)
The range of the data is difference between the maximum value and the minimum value
which is = 93 – 55
38
(c)
The interquartile range is the difference between third quartile and second quartile
D – B
87 – 73
14
(d)
The data in the above Box plot is the value of median of the whole data set and the range
is the difference between maximum value and minimum value. The second quartile is
the median obtained of the first half list and third quartile is the median obtained of
second half list.
chevron_up_icon
1 out of 3
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]