This document provides study material for statistics including solved assignments, essays, and dissertations. It covers topics such as hypothesis testing, regression analysis, and prediction models. The content is suitable for students studying statistics in college or university courses.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
STATISTICS [DATE] STUDENT ID [Company address]
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Question 1 (a)Claim: The online retail store would be profitable when the average order would exceed $85. As the population standard deviation of the order variable is not known, hence the relevant test statistic would be t. The p value is lower than the significance level (assuming 5%) and hence, sufficient evidence is present to reject the null hypothesis and to accept the alternative hypothesis. Hence, the claim is right that the online retail store would be profitable when the average order exceeds $85. (b)Claim: Proportion of the people who has received an e-gift card for Christmas is higher than 20%. 1: NO, 2: YES 1
The p value is higher than the significance level (5%) and hence, insufficient evidence is present to reject the null hypothesis. Hence, the claim is not right that proportion of the people who have received an e-gift card for Christmas is higher than 20%. (c)Claim: Significant difference is present between the mean two appraisers. NullhypothesisHo:μAppraiser1−μAppraiser2=0 AlternativehypothesisHa:μAppraiser1−μAppraiser2≠0 Considering that the population standard deviation for the two variables is unknown, hence the test statistics ought to be t. Further, two sample independent t test needs to be computed whose result from Excel is illustrated below. 2
This is a two tailed hypothesis test and therefore, the two tailed p value would be used. The two tailed p value (0.7414) is higher than the significance level (5%) and hence, insufficient evidence is present to reject the null hypothesis. Hence, it can be concluded that nosignificantdifferenceispresentbetweenthemeanofthetwoappraisersinthe assessments. Question 2 (a)Multiple regression model Least square regression line equation Longevity=3.244+(0.451∗Mother)+(0.411∗Father)+(0.017∗Gmothers)+(0.087∗Gfathers) 3
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
(b)Interpretation of slope coefficients Mother: If the age of the mother changes by 1 year, then the expected age of the person concerned would change by 0.451 years in the same direction. Father: If the age of the father changes by 1 year, then the expected age of the person concerned would change by 0.411 years in the same direction. Gmother: If the age of the grandmother changes by 1 year, then the expected age of the person concerned would change by 0.017 years in the same direction. Gfather: If the age of the grandfather changes by 1 year, then the expected age of the person concerned would change by 0.087 years in the same direction. The requisite hypotheses for testing the significance of the slope are as follows. NullhypothesisHo:β=0Slope is not significant. AlternativehypothesisHa:β≠0Slope is significant Significance level (Alpha) = 0.05 Based on the above, it can be said that only Mother and Father slope coefficients are statistically significant. (c)Longevity of man =? When, Mother = 75, Father =75, Grandmothers = 77, Grandmothers =73 Least square regression line equation Longevity=3.244+(0.451∗Mother)+(0.411∗Father)+(0.017∗Gmothers)+(0.087∗Gfathers) Longevity=3.244+(0.451∗75)+(0.411∗75)+(0.017∗77)+(0.087∗73) Longevity=75.5 Hence, the longevity of a man for the given inputs would be 75.5 years. 4
(d)Multiple regression model Smoker YES =1, NO =0 The key differences in the above model when compared with the original model are indicated below. The coefficient for mother and father coefficients have decreased implying that their age would have lower impact on the age of the child. Further, the significance of the coefficients related to grandmother and grandfather have improved in the model as the corresponding p value of slope has decreased. (e)Interpretation of smoker dummy variable: If the underlying person is a smoker, than the average age would be reduced by 3.719 in comparison to a non-smoker assuming all the other parameters are same. The requisite hypotheses for testing the significance of slope are as follows. NullhypothesisHo:β=0Slope is not significant. AlternativehypothesisHa:β≠0Slope is significant Significance level (Alpha) = 0.01 5
The p value for smoker comes out to be zero which is lower than significance level and hence, null hypothesis will be rejected and alternative will be accepted. Thus,it can be said that smoker variable is statistically significant and hence, smoking would affect the length of a life significantly. Question 3 (a)Multiple regression model The regression equation is as follows. Time = -28.427 + 0.604*Boxes + 0.374*Weight The slope coefficients for the above regression model can be interpreted as shown belo.w Boxes: If the number of box is changed by 1, the time taken to unload would alter by 0.604 minutes in the same direction. The positive sign of the slope is on expected lines as more time would be required for unloading more boxes. Weight: If the weight of a box is changed by 100 kg, then the corresponding time taken to unload the box would alter by 0.374 minutes in the same direction. The positive sign of the slope is on expected time as higher time is expected to be consumed for unloading a heavier box. (b)Simple regression model with codes for time of day 6
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
The issue with the above model is that it assumes that there is uniform difference in time taken to unload between morning & early afternoon and early afternoon & late afternoon. This confusion has been created using to the use of numerical measures which are equidistant from one another. (c)Let the dummy variable is codes i.e. code = 0; morning and code = 1; afternoon (d)Model B is better considering the fact that for this model, the code variable is also statistically significant as indicated from the p value of zero corresponding to the slope coefficient of codes. Further, there is an improvement in the R2value for Model B which clearly implies that the predictive ability or power of this model is superior in comparison to Model A. 7
(e)In order to highlight if time is significantly impacted by the time of day, it needs to be determinedwhethertheunderlyingslopecoefficientforthesameisstatistically significant or not. The requisite hypotheses for testing the significance of slope are as follows. NullhypothesisHo:β=0Slope is not significant. AlternativehypothesisHa:β≠0Slope is significant Significance level (Alpha) = 0.05 The p value for slope coefficient of codes variable comes out to be zero which is lower than significance level and hence, null hypothesis will be rejected and alternative will be accepted. Thus,it can be said that the time is significantly dependent on the time of day when unloading takes place. (f)Prediction of time required to unload truck =? Number of boxes = 100 Weight of boxes = 5000 kg Three times of day = 1, 2, 3 Now, Time required to unload truck in the morning (code =1) Time=−41.422+(0.644∗Boxes)+(0.349∗Weight)+(4.543∗Codes) Time=−41.422+(0.644∗100)+(0.349∗5000)+(4.543∗1)=1774.85minutes Time required to unload truck in the early afternoon (code =2) Time=−41.422+(0.644∗100)+(0.349∗5000)+(4.543∗2)=1779.39minutes Time required to unload truck in the late afternoon (code =3) Time=−41.422+(0.644∗100)+(0.349∗5000)+(4.543∗3)=1783.93minutes 8