Normal Distribution: Testing for Adherence to Theoretical Properties

Verified

Added on  2023/06/03

|11
|1911
|228
AI Summary
This article discusses the testing of adherence of baseball game attending cost to normal distribution using various statistical tools and theoretical properties.
tabler-icon-diamond-filled.svg

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
STATISTICS
STUDENT NAME/ID
[Pick the date]
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Part A
Mean time to download the home page of the website μ=1.2 seconds
Standard deviation to download the home page of the website σ =0.2 seconds
Download time follows Normal distribution.
(a) Probability that the download time for the home page of the website would be above 1.8
seconds.
P ( x>1.8 )=?
The probability that download time is higher than 1.8 seconds is equal to the blue area
highlighted under the curve.
P ( x>1.8 )=P ( xμ>1.8>1.2 )=P ( xμ
σ > 1.81.2
0.2 )
P ( x>1.8 ) =P ( z>3 )
Using standard normal distribution table
P ( z>3 )=0.0013
Hence,
P ( x>1.8 )=0.0013
1
Document Page
Therefore, there is a 0.0013 probability that the download time for the home page of the website
would be above 1.8 seconds.
(b) Probability that the download time for the home page of the website would between 1.5
seconds and 2.5 seconds.
P ( 1.5< x< 2.5 )=?
The probability that download time is between 1.5 seconds and 2.5 seconds is equal to the blue
area highlighted under the curve.
P ( 1.5< x< 2.5 )=P ( 1.51.2< xμ< 2.51.2 )=P (1.51.2
0.2 < xμ
σ > 2.51.2
0.2 )
P ( 1.5< x< 2.5 )=P ( 1.5< z <6.5 )
Using standard normal distribution table
P ( 1.5< z< 6.5 )=0.0668
Hence,
P ( 1.5< x< 2.5 )=0.0668
Therefore, there is a 0.0668 probability that the download time for the home page of the website
would between 1.5 seconds and 2.5 seconds.
2
Document Page
(c) Let 95% of the download time for the home page of the website is lower than x seconds.
The z value for 95% probability can be computed based on NORMSINV () and is highlighted
below.
Now,
z value=NORMSINV ( 0.95 )=1.6449
xμ
σ =z
x1.2
0.2 =1.6449
x=1.529
Therefore, it can be concluded that 95% of the download time for the home page of the website
is slower than 1.529 seconds.
Part B
Introduction
In statistics, normal distribution plays a crucial role considering the fact that a majority of
analysis and statistical tools tend to assume that the underlying distribution is normal. Also, for a
distribution that resembles normal distribution, it is easier to predict the population parameter
considering the sample statistic. In order to ascertain whether a given distribution is normal or
not, there are a host of tests and theoretical properties that ought to be satisfied. In this backdrop,
the key objective is to highlight whether the baseball game attending cost tends to adhere to
normal distribution or not using the aid of various statistical tools and underlying theoretical
properties.
Testing of sample data
3
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
In wake of the various theoretical properties of normal distribution coupled with statistical
techniques, the analysis of the given sample data is provided below.
(a) Box plot
The Five Number Summary for cost of the tickets is highlighted below.
Five Number Summary
Cost ($)
Minimum 115.00
25th percentile 160.25
Median 179.00
75th percentile 220.00
Maximum 335.00
Box plot for cost of the tickets is highlighted below.
0
50
100
150
200
250
300
350
400 Box Plot: Cost of attending a baseball game ($)
It is apparent that the quartiles are not equal in width. A good example of this is the fact that 25%
of the values tend to lie between $ 160.25 and $ 179. In contrast the highest 25% values tend to
lie between $ 220 and $ 335. There is clear presence of outliers on the higher side and the
presence of positive skew which hints at the given data being non-normally distributed (Eriksson
& Kovalainen, 2015).
4
Document Page
(b) Histogram
Classes Frequenc
y
115 to 146 5
146 to 178 10
178 to 209 4
209 to 241 7
241 to 272 1
272 to 304 0
304 to 335 3
115 to 146 146 to 178 178 to 209 209 to 241 241 to 272 272 to 304 304 to 335
0
2
4
6
8
10
12 Hitogram : Cost of Attending a Baseball Game ($)
Cost of tickets ($)
Frequency
The shape of the histogram is not symmetric and no bell curve shape seems to be formed. There
is presence of skew in the data as the tail on the right side is greater in length than the
corresponding tail on the left side (Hillier, 2016). Hence, positive skew is present in the give data
which implies that it is quite likely a non-normal distribution (Flick, 2015).
5
Document Page
(c) Data characteristics to theoretical properties
Summary statistics
Cost($)
Mean 194.70
Standard Error 10.37
Median 179.00
Mode 227.00
Standard
Deviation
56.82
Sample
Variance
3228.6
3
Kurtosis 1.05
Skewness 1.09
Range 220.00
Minimum 115.00
Maximum 335.00
Sum 5841.0
0
Count 30.00
Mean comes out to be higher than median. It is expected that for normal distribution there
needs to be convergence of the two values. However, there is a very significant divergence
which hints at distribution being non-normal for the given data (Hair et. al., 2015).
IQR of the data has been computed based on the 25th percentile and 75th percentile.
IQR=75 th percentile25 th percentile=220160.25=59.75
1.33 times of standard deviation ¿ 1.3356.82=75.57
6
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
It can be seen from the above that IQR is not same as 1.33 times of the standard deviation and
there seems to be quite significant difference which hints at the non-normality of the given data
(Koch, 2015).
Range is defined as the difference between the minimum and maximum value and comes out
to be 220
6 times of standard deviation ¿ 656.82=340.93
It can be seen from the above that range is not same as 6 times of the standard deviation. Also,
these are not even similar considering the wide difference between the two values.
Mean = 194.70 and standard deviation 56.82
Mean ± 1 standard deviation=194.70±( 156.82)=137.88 ¿ 251.52
23 (76.67%) values are falling between the above range and hence, it cannot be concluded that
68.26% values do not lie between +/- 1 standard deviation of mean.
Mean ± 1.28 standard deviation=194.70 ±(1.2856.82)=121.97 ¿ 267.43
25 (83.33%) cost values are fall between the above range and hence, it can be concluded that
more than 80% value lie between +/- 1.28 standard deviation of mean.
Mean ± 2 standard deviation=194.70 ± ( 256.82 ) =81.06 ¿ 308.34
27 (90%) cost values are fall between the above range and hence, it cannot be concluded that
95.44% value lie between +/- 2 standard deviation of mean.
Skew is not equal to zero and has high a high positive value of 1.09.as apparent from the
summary statistics highlighted above.
Kurtosis is not equal to zero and has a high positive value 1.05 as apparent from the summary
statistics highlighted above.
It is evident that all the theoretical properties that are expected to be fulfilled by a data which is
normal or approximately normal are not adhered to by the given data. Hence, based on the given
7
Document Page
theoretical properties, it would be prudent to conclude that the given data in relation to expenses
borne while attending one baseball game would not be considered as normal distribution (Hastie,
,Tibshirani & Friedman, 2016).
(d) Quantile-Quantile Normal Probability Plot
8
Document Page
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
Quantile-Quantile Normal Probability Plot
Cost of tickets ($)
Z score
Actual zscore for cost of ticekts
The normal probability plot clearly implies that the plot does follow a linear trend. Also, there is
presence of certain outliers on the higher end and also lower end which is not permitted for a
normal distribution. The trend of the middle values also is not normal owing to deviation from
the linear trend. Considering the above, it is apparent that the given distribution cannot be
assumed as normal distribution (Fehr & Grossman, 2014).
Conclusion
The results of various statistical analysis has been presented above with regards to the given data.
It is apparent that the boxplot did not indicate the quartiles that were equally distributed in terms
of width. Also, there was clear evidence of the presence of outliers especially on the higher side.
In relation to the histogram, the shape is not symmetric and also there is a right tail which
highlights the presence of right tail. Further, the median and mean of the data do not converge.
Additionally, skew and kurtosis of the data are non-zero. Besides, other aspects related to the
distribution of the data are not satisfied. Considering the same, it would be apparent to conclude
that the given sample data in relation to the cost of attending one baseball game does not seem to
adhere to a normal distribution. Further, even if some relaxation is made in the various criteria,
then also approximate normality for the data cannot be established.
9
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
References
Eriksson, P. & Kovalainen, A. (2015). Quantitative methods in business research (3rd ed.).
London: Sage Publications.
Fehr, F. H., & Grossman, G. (2013). An introduction to sets, probability and hypothesis testing
(3rd ed.). Ohio: Heath.
Flick, U. (2015). Introducing research methodology: A beginner's guide to doing a research
project (4th ed.). New York: Sage Publications.
Hair, J. F., Wolfinbarger, M., Money, A. H., Samouel, P., & Page, M. J. (2015). Essentials of
business research methods (2nd ed.). New York: Routledge.
Hastie, T., Tibshirani, R. & Friedman, J. (2016). The Elements of Statistical Learning (4th
ed.). New York: Springer Publications.
Hillier, F. (2016). Introduction to Operations Research. (6th ed.). New York: McGraw Hill
Publications.
Koch, K.R. (2015). Parameter Estimation and Hypothesis Testing in Linear Models (2nd ed.).
London: Springer Science & Business Media.
10
chevron_up_icon
1 out of 11
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]