EDF 5400 Problem Set: Statistical Analysis and Interpretation of Data

Verified

Added on  2023/06/03

|9
|951
|370
Homework Assignment
AI Summary
Document Page
EDF 5400 PROBLEM SET
STUDENT NAME/ID
[Pick the date]
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Question 1
For Fall 2017
Mean = 77
Standard deviation =5
For Fall 2016
Mean = 59
Standard deviation =4
Distribution of scores in both the fall year =Normal distribution
a) Score of 61 on Fall 2016
z= 6159
4 =0.5
Now, let the score on Fall 2017 is x
x77
5 =0.5
x=79.5
b) Score of 80 on Fall 2017
z= 8077
4 =0.75
Now, let the score on Fall 2016 is x
x59
4 =0.75
x=62
1
Document Page
Question 2
Normal distribution
Proportions are computed through the excel function NORMSDIST (z).
a) Proportion between z =-0.93 and z = 0.93
P (0.93< z <0.93 ) =0.82380.1761=0.6476
b) Proportion between z=0 and z =0.98
P ( 0< z <0.98 ) =0.83640.5=0.3365
c) Proportion below z =1.80
2
Document Page
P ( z<1.80 )=0.9641
d) Proportion above z =1.67
P ( z>1.67 )=1P ( z <1.67 ) =10.9525=0.0475
Question 3
(a) The requisite histograms for the two variables are indicated below.
3
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
It is apparent that there is a left or negative skew since the tail on the left in longer than the right
tail.
It is apparent that there is a left or negative skew since the tail on the left in longer than the right
tail.
(b) The requisite scatterplot is indicated as follows.
4
Document Page
(c) It is apparent that there is a positive relationship between IQ and Test Score considering that
the scatter points indicate a upward or positive slope. However, the strength of the relationship
between the two variables is moderately high considering that the points do not tend to follow a
strictly linear pattern and has deviations from the best fit line. However, these deviations do not
appear to be very significant in magnitude.
(d) The correlation coefficient has been computed in excel using the CORREL function and has
come out as 0.745.
(e) The new variable has been created and indicated below.
5
Document Page
(f) The histogram of the new variable is indicated as follows.
6
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
It is apparent that for the current histogram, the level of skew seems to have declined considering
more balance distribution of the tails on either sides.
(g) The requisite scatter plot is indicated as follows.
(h) It is apparent that the new scatterplot has a negative slope in comparison to the original
scatter plot. Also, the strength of the relationship between the variables seems to be less strong in
the new scatter plot when compared with the previous scatter plot.
(i) The correlation coefficient between IQ and New_Score has come out as -0.69. It is apparent
that in comparison to the original correlation coefficient, the new one is now only weaker in
strength but also has a negative relationship unlike the previous one which had a positive
relationship.
(j) It is apparent that it would be better to use the Original test score as intuitively higher IQ
levels should lead to higher test scores. However, with the use of New-Score, there is a negative
correlation with IQ which does not seem intuitive and not preferred.
Question 4
7
Document Page
a) The independent variable in the given case is height of the student while the dependent
variable is weight of the student. This is because the weight of the student typically depends on
height. Also, the same is reflected from correlation and regression analysis.
b) There is a positive relationship between height and weight of students which is apparent from
the positive value of correlation coefficient coupled with positive slope of the best fit line
indicated in the scatter plot. Hence, the movement of height and weight would be expected in the
same direction. Further, the strength of the relationship is strong considering the fact that the
correlation coefficient is significant at 1% level of significance. This implies that the linear
relationship between the variables is significant.
c) The regression model is indicated as follows.
College Student Weight = -199.536 + 5.006* College Student’s height
d) The intercept value is -199.536 which implies that this is the weight of the student whose
height is zero. Clearly, this is not practical and is essentially a theoretical explanation.
e) The slope value is 5.006 which implies that as the height of the college student changes by 1
unit, the corresponding change in the weight is 5.006 and both changes happen in the same
direction.
f) Mean height (from descriptive statistics) = 68.7174
College Student Weight = -199.536 + 5.006*68.7174 = 144.46 units
g) The required information can be determined through the coefficient of determination or R2
obtained from the regression output. The value of this variable is 0.600. This implies that 60% of
the variation in weight score may be explained on the basis of the given model.
h) Yes, the assumptions of the normality of residuals seems to be met as the given residuals do
tend to approximately fit in the form of a bell curve which highlights that these can be assumed
to be distributed in the form of a normal distribution.
i)
8
chevron_up_icon
1 out of 9
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]