Statistics Study Material
VerifiedAdded on 2023/03/20
|7
|1378
|58
AI Summary
This document provides study material for statistics, covering topics such as confidence intervals, proportion estimation, regression analysis, and prediction intervals. It includes solved examples and explanations for each topic. The document also discusses the significance of different statistical tests and provides insights into the interpretation of regression models. Suitable for students studying statistics at the college or university level.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
STATISTICS
[Document subtitle]
[DATE]
[Document subtitle]
[DATE]
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Question 1
(a) 95% confidence interval for mean annual expenditure of all reader households in USA
Mean = $95.50
Standard deviation = $50
Sample size = 100
Standard error = Standard deviation/sqrt (Sample size) = 5
The t value for 95% confidence interval = 1.98
Margin of error = t value * Standard error = 1.98*5 = 9.9
Lower limit of 95% confidence interval = Mean - Margin of error = 95.50 – 9.9 = 85.6
Upper limit of 95% confidence interval = Mean + Margin of error = 95.50 +9.9 = 105.4
95% confidence interval = [85.6 105.4]
(b) This is because the underlying confidence interval is based on the t statistics and not on z
statistics. T statistics does not assume that the distribution of the sample should be normal
in distribution and thereby ensures that the validity of the confidence interval is not
adversely impacted.
(c) Number of households in US = 120 million
Proportion of reader household in sample p= 100/1000 = 0.1
95% confidence interval for the proportion
The z value for 95% confidence interval = 1.96
Standard error = sqrt (p*q/n) = sqrt (0.1*0.9/1000) = 0.009487
Margin of error = z value * Standard error =1.96*0.009487 = 0.01859
Lower limit of 95% confidence interval = p - Margin of error = 0.1 –0.01859= 0.0814
1
(a) 95% confidence interval for mean annual expenditure of all reader households in USA
Mean = $95.50
Standard deviation = $50
Sample size = 100
Standard error = Standard deviation/sqrt (Sample size) = 5
The t value for 95% confidence interval = 1.98
Margin of error = t value * Standard error = 1.98*5 = 9.9
Lower limit of 95% confidence interval = Mean - Margin of error = 95.50 – 9.9 = 85.6
Upper limit of 95% confidence interval = Mean + Margin of error = 95.50 +9.9 = 105.4
95% confidence interval = [85.6 105.4]
(b) This is because the underlying confidence interval is based on the t statistics and not on z
statistics. T statistics does not assume that the distribution of the sample should be normal
in distribution and thereby ensures that the validity of the confidence interval is not
adversely impacted.
(c) Number of households in US = 120 million
Proportion of reader household in sample p= 100/1000 = 0.1
95% confidence interval for the proportion
The z value for 95% confidence interval = 1.96
Standard error = sqrt (p*q/n) = sqrt (0.1*0.9/1000) = 0.009487
Margin of error = z value * Standard error =1.96*0.009487 = 0.01859
Lower limit of 95% confidence interval = p - Margin of error = 0.1 –0.01859= 0.0814
1
Upper limit of 95% confidence interval = p + Margin of error = 0.1 +0.01859= 0.1186
95% confidence interval for population proportion = [0.0814 0.1186]
Considering that there are 120 million households, the 95% confidence interval for the
number of reader households in the US = (0.0814*120 million, 0.1186*120 million) =
(9.768,14.232) million.
(d) Minimum sample size needs to be computed
Margin of error = 5
Confidence interval = 95%
Standard deviation = 50
The z value for 95% confidence interval = 1.96
Minimum sample size = (z value * Standard deviation/ Margin of error)2
Minimum sample size = (1.96*50/5)2 = 385
Additional unit required = 385 – 100 = 285
There must be 285 additional households from US population that needs to be sampled in
regards to satisfy the requirement.
Question 2
(a) % variation in commercial cost that is explained by variation in Nielsen Rating
From correlation matrix, the value of correlation coefficient between X1 and Y comes out to
be 0.715.
Now,
Correlation coefficient = 0.715
Coefficient of determination (R square) = (Correlation coefficient)2 = (0.715)2 = 0.511.
2
95% confidence interval for population proportion = [0.0814 0.1186]
Considering that there are 120 million households, the 95% confidence interval for the
number of reader households in the US = (0.0814*120 million, 0.1186*120 million) =
(9.768,14.232) million.
(d) Minimum sample size needs to be computed
Margin of error = 5
Confidence interval = 95%
Standard deviation = 50
The z value for 95% confidence interval = 1.96
Minimum sample size = (z value * Standard deviation/ Margin of error)2
Minimum sample size = (1.96*50/5)2 = 385
Additional unit required = 385 – 100 = 285
There must be 285 additional households from US population that needs to be sampled in
regards to satisfy the requirement.
Question 2
(a) % variation in commercial cost that is explained by variation in Nielsen Rating
From correlation matrix, the value of correlation coefficient between X1 and Y comes out to
be 0.715.
Now,
Correlation coefficient = 0.715
Coefficient of determination (R square) = (Correlation coefficient)2 = (0.715)2 = 0.511.
2
The R square value represents that % variation in dependent variable that is explained by
variation in independent variable. Therefore, only 51.1% of variation in commercial cost is
explained by corresponding variation in Nielsen Rating.
(b) ANOVA Table for the regression model with 4 independent variables is shown below.
ANOVA
df SS MS F
Significance
F
Regression 3
9,09,
360 3,03,120 57.632 0.000
Residual 41
2,15,
640
5,259.
512
Total 44
11,25,
000
Degree of freedom of regression = m-1 = 4-1 = 3
Degree of freedom of residual = n-m = 45-4 = 41
Degree of freedom total = n-1 = 45-1 = 44
SSE (Given) = 215,640
SST (Total) = 11,25,000
SSR = SST -SSE = 1125000-215640 = 9,09,360
MSR (Regression) ¿ SSR/ m−1=9,09,360/3=3,03,120
MSE (Residual) = SS (Residual) / n-m = 215,640 / 41 = 5259.512
F = MSR/MSE = 57.632
The significance F (p value) = 0.00
Based on the p value of 0.00, it can be estimated that the given regression model is significant
as there is atleast one slope which is significant.
3
variation in independent variable. Therefore, only 51.1% of variation in commercial cost is
explained by corresponding variation in Nielsen Rating.
(b) ANOVA Table for the regression model with 4 independent variables is shown below.
ANOVA
df SS MS F
Significance
F
Regression 3
9,09,
360 3,03,120 57.632 0.000
Residual 41
2,15,
640
5,259.
512
Total 44
11,25,
000
Degree of freedom of regression = m-1 = 4-1 = 3
Degree of freedom of residual = n-m = 45-4 = 41
Degree of freedom total = n-1 = 45-1 = 44
SSE (Given) = 215,640
SST (Total) = 11,25,000
SSR = SST -SSE = 1125000-215640 = 9,09,360
MSR (Regression) ¿ SSR/ m−1=9,09,360/3=3,03,120
MSE (Residual) = SS (Residual) / n-m = 215,640 / 41 = 5259.512
F = MSR/MSE = 57.632
The significance F (p value) = 0.00
Based on the p value of 0.00, it can be estimated that the given regression model is significant
as there is atleast one slope which is significant.
3
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
However, when the individual slopes are analysed for significance, then it is apparent that
variable X4 would have an insignificant slope. This is because t value = 10/7.79 = 1.28. The
p value corresponding to the above t statistic is greater than 0.05 implying that the slope is
not significant. As a result, the second model is preferable which consider only three
independent variables and exclude X4.
(c) Second model would be selected because variable X4 is insignificant.
Coefficient of determination r 2= SSR
SST = 900484
1125000 =0.8004
The coefficient of determination refers that 80.04% variation in Y (cost of airing a 30 second
commercial for television program) would be described by joint variation in the independent
variable X1 (Nielsen Rating of program), X2 (Percentage of audience consisting of 18-34 age
group watching program) and X3 (Prise time).
(d) Prediction Interval
The t value for 95% prediction interval and degree of freedom = 2.01536
TWO AND A HALF MEN
X1 = 5.9, X2 = 80%, X3 = 1 (prime), X4 =1 (comedy)
Yo=15+ ( 32∗5.9 ) + ( 1.10∗80 % ) + ( 25∗1 ) + ( 10∗1 )=239.68
Yo=239,680
σ 2= SSE
n−2 = 215640
45−2 =5014.88
Lower limit = 239680 – 2.01536 * sqrt (5014.88 *(1+(1/45) = 23965.96
Upper limit = 239680 +2.01536 * sqrt (5014.88 *(1+(1/45) = 23970.04
95% prediction interval = [23965.96 23970.04]
(e) Regression Equation
4
variable X4 would have an insignificant slope. This is because t value = 10/7.79 = 1.28. The
p value corresponding to the above t statistic is greater than 0.05 implying that the slope is
not significant. As a result, the second model is preferable which consider only three
independent variables and exclude X4.
(c) Second model would be selected because variable X4 is insignificant.
Coefficient of determination r 2= SSR
SST = 900484
1125000 =0.8004
The coefficient of determination refers that 80.04% variation in Y (cost of airing a 30 second
commercial for television program) would be described by joint variation in the independent
variable X1 (Nielsen Rating of program), X2 (Percentage of audience consisting of 18-34 age
group watching program) and X3 (Prise time).
(d) Prediction Interval
The t value for 95% prediction interval and degree of freedom = 2.01536
TWO AND A HALF MEN
X1 = 5.9, X2 = 80%, X3 = 1 (prime), X4 =1 (comedy)
Yo=15+ ( 32∗5.9 ) + ( 1.10∗80 % ) + ( 25∗1 ) + ( 10∗1 )=239.68
Yo=239,680
σ 2= SSE
n−2 = 215640
45−2 =5014.88
Lower limit = 239680 – 2.01536 * sqrt (5014.88 *(1+(1/45) = 23965.96
Upper limit = 239680 +2.01536 * sqrt (5014.88 *(1+(1/45) = 23970.04
95% prediction interval = [23965.96 23970.04]
(e) Regression Equation
4
Y =15+32 X 1+1.10 X 2+25 X 3+10 X 4
Program 1: NCIS
X1 = 8.2, X2 = 30%, X3 = 1 (prime), X4 = 0(non-comedy)
Y =15+ ( 32∗8.2 ) + ( 1.10∗30 % )+ ( 25∗1 ) + ( 10∗0 )=302.73
Program 2: TWO AND A HALF MEN
X1 = 5.9, X2 = 80%, X3 = 1 (prime), X4 =1 (comedy)
Y =15+ ( 32∗5.9 ) + ( 1.10∗80 % )+ ( 25∗1 ) + ( 10∗1 )=239.68
The difference of these two programs = 302.73 – 239.68 = 63.05 (thousands of dollars)
(f) 25th percentile
X1 = 5.0, X2 = 60%, X3 = 1 (prime), X4 = 0(non-comedy)
Y =15+ ( 32∗5.0 ) + ( 1.10∗6 0 % ) + ( 25∗1 )+ ( 10∗0 )=200.66
25th percentile = 200.66/2 = 100.33
It represents that 25% of the values would lie below 100.33.
(g) (i) Regression Equation
Y =15+32 X 1+1.10 X 2+25 X 3+10 X 4
X1 = 43.3, X2 = 50%, X3 = 1 (prime), X4 = 0(non-comedy)
Y=15+(32*43.3) +(1.10*50%) +(25*1) +(10*0) =1426.15(thousands of dollars)
Hence, the cost of a 30 second commercial during Superbowl 42 would be $14,26,150.
(ii) Cost of a 30 second commercial = $2,700,000
Cost of a 30 second commercial during Superbowl 42 = $14,26,150
Residual = $2,700,000 - $14,26,150 = $12,73,850
5
Program 1: NCIS
X1 = 8.2, X2 = 30%, X3 = 1 (prime), X4 = 0(non-comedy)
Y =15+ ( 32∗8.2 ) + ( 1.10∗30 % )+ ( 25∗1 ) + ( 10∗0 )=302.73
Program 2: TWO AND A HALF MEN
X1 = 5.9, X2 = 80%, X3 = 1 (prime), X4 =1 (comedy)
Y =15+ ( 32∗5.9 ) + ( 1.10∗80 % )+ ( 25∗1 ) + ( 10∗1 )=239.68
The difference of these two programs = 302.73 – 239.68 = 63.05 (thousands of dollars)
(f) 25th percentile
X1 = 5.0, X2 = 60%, X3 = 1 (prime), X4 = 0(non-comedy)
Y =15+ ( 32∗5.0 ) + ( 1.10∗6 0 % ) + ( 25∗1 )+ ( 10∗0 )=200.66
25th percentile = 200.66/2 = 100.33
It represents that 25% of the values would lie below 100.33.
(g) (i) Regression Equation
Y =15+32 X 1+1.10 X 2+25 X 3+10 X 4
X1 = 43.3, X2 = 50%, X3 = 1 (prime), X4 = 0(non-comedy)
Y=15+(32*43.3) +(1.10*50%) +(25*1) +(10*0) =1426.15(thousands of dollars)
Hence, the cost of a 30 second commercial during Superbowl 42 would be $14,26,150.
(ii) Cost of a 30 second commercial = $2,700,000
Cost of a 30 second commercial during Superbowl 42 = $14,26,150
Residual = $2,700,000 - $14,26,150 = $12,73,850
5
(iii) Considering that the residual value is very high, it shows that the given point is far away
from the regression line and thereby, can be qualified as an outlier.
(iv) The cost of advertisement related to program which are periodically aired. This is sharply
contrasting Superbowl 42 which is essential an event that takes place for a limited time only.
As a result, the given statistical model cannot truly capture the advertisement cost related to
Super bowl 42.
6
from the regression line and thereby, can be qualified as an outlier.
(iv) The cost of advertisement related to program which are periodically aired. This is sharply
contrasting Superbowl 42 which is essential an event that takes place for a limited time only.
As a result, the given statistical model cannot truly capture the advertisement cost related to
Super bowl 42.
6
1 out of 7
Related Documents
Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
© 2024 | Zucol Services PVT LTD | All rights reserved.