Statistics and Data Analysis - Desklib Online Library
VerifiedAdded on 2023/06/10
|9
|1155
|391
AI Summary
This document discusses linear regression models and correlation coefficients in Statistics and Data Analysis. It includes answers to questions on the impact of unwanted rain on quality rating, the significance of multiple regression models, and the selection of competing regression models. The document also includes scatterplots, residual plots, and regression plots. No specific course or college/university is mentioned.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
STATISTICS AND DATA ANALYSIS
Statistics and Data Analysis
Name of the student:
Name of the university:
Author’s note:
Statistics and Data Analysis
Name of the student:
Name of the university:
Author’s note:
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
1
STATISTICS
Table of Contents
Answer 5.3.......................................................................................................................................2
Answer 5.3.a................................................................................................................................3
Answer 5.3.b................................................................................................................................3
Answer 6.4.......................................................................................................................................4
Answer 6.4.a................................................................................................................................6
Answer 6.4.b................................................................................................................................7
Answer 6.4.c................................................................................................................................7
References:......................................................................................................................................8
STATISTICS
Table of Contents
Answer 5.3.......................................................................................................................................2
Answer 5.3.a................................................................................................................................3
Answer 5.3.b................................................................................................................................3
Answer 6.4.......................................................................................................................................4
Answer 6.4.a................................................................................................................................6
Answer 6.4.b................................................................................................................................7
Answer 6.4.c................................................................................................................................7
References:......................................................................................................................................8
2
STATISTICS
Answer 5.3.
20 25 30 35 40 45 50 55
1 2 3 4 5
Scatterplot
End of Harvest (in days since August 31)
Q u a lity
Rain at Harvest?
No
Yes
STATISTICS
Answer 5.3.
20 25 30 35 40 45 50 55
1 2 3 4 5
Scatterplot
End of Harvest (in days since August 31)
Q u a lity
Rain at Harvest?
No
Yes
3
STATISTICS
Answer 5.3.a.
The coefficient of the interaction term in this linear regression model is (-0.08314) with p-value
= 0.012. The calculated p-value is less than 0.05. Therefore, the interaction term of “End of
Harvest” and “Rain” in the linear regression model is significant at 5% level of significance
(Zou, Tuncali and Silverman 2003).
Hence, it is evident that the rate of change in quality rating relies on the unwanted rain at
vintage.
Answer 5.3.b.
i) The estimated number of days of delay to the end of harvest it takes to reduce the quality
rating by 1 point when there is “No unwanted rain at harvest” (0):
Quality = 5.16122 – 0.03415 * “End of Harvest”.
ii) The estimated number of days of delay to the end of harvest it takes to reduce the quality
rating by 1 point when there is “Some unwanted rain at harvest” (1):
Quality = (5.16122 + 1.7867) + (-0.03145-0.08314) * “End of Harvest”.
STATISTICS
Answer 5.3.a.
The coefficient of the interaction term in this linear regression model is (-0.08314) with p-value
= 0.012. The calculated p-value is less than 0.05. Therefore, the interaction term of “End of
Harvest” and “Rain” in the linear regression model is significant at 5% level of significance
(Zou, Tuncali and Silverman 2003).
Hence, it is evident that the rate of change in quality rating relies on the unwanted rain at
vintage.
Answer 5.3.b.
i) The estimated number of days of delay to the end of harvest it takes to reduce the quality
rating by 1 point when there is “No unwanted rain at harvest” (0):
Quality = 5.16122 – 0.03415 * “End of Harvest”.
ii) The estimated number of days of delay to the end of harvest it takes to reduce the quality
rating by 1 point when there is “Some unwanted rain at harvest” (1):
Quality = (5.16122 + 1.7867) + (-0.03145-0.08314) * “End of Harvest”.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
4
STATISTICS
= 6.94792 – 0.11459 * “End of Harvest”.
Answer 6.4.
KPOINT
6 8 10 12 0.015 0.030 0.045
0 20 40 60
6 8 10 12
RA
VTINV
0.0025 0.0045
0.015 0.030 0.045
DIPINV
0 20 40 60 0.0025 0.0045 -450 -350 -250
-450 -350 -250
HEAT
Pairwise scatterplots
STATISTICS
= 6.94792 – 0.11459 * “End of Harvest”.
Answer 6.4.
KPOINT
6 8 10 12 0.015 0.030 0.045
0 20 40 60
6 8 10 12
RA
VTINV
0.0025 0.0045
0.015 0.030 0.045
DIPINV
0 20 40 60 0.0025 0.0045 -450 -350 -250
-450 -350 -250
HEAT
Pairwise scatterplots
5
STATISTICS
Regression Plots:
0 10 20 30 40 50 60
-5 0 5
Fitted values
R e s id u a ls
Residuals vs Fitted
29
23
6
-2 -1 0 1 2
-2 -1 0 1 2
Theoretical Quantiles
S t a n d a rd iz e d re s id u a ls
Normal Q-Q
29
23
6
0 10 20 30 40 50 60
0 . 0 0 . 4 0 . 8 1 . 2
Fitted values
Standardized residuals Scale-Location
2923 6
0.0 0.1 0.2 0.3 0.4 0.5
-2 -1 0 1 2
Leverage
S t a n d a rd iz e d re s id u a ls
Cook's distance 1
0.5
0.5
1
Residuals vs Leverage
32
12
23
STATISTICS
Regression Plots:
0 10 20 30 40 50 60
-5 0 5
Fitted values
R e s id u a ls
Residuals vs Fitted
29
23
6
-2 -1 0 1 2
-2 -1 0 1 2
Theoretical Quantiles
S t a n d a rd iz e d re s id u a ls
Normal Q-Q
29
23
6
0 10 20 30 40 50 60
0 . 0 0 . 4 0 . 8 1 . 2
Fitted values
Standardized residuals Scale-Location
2923 6
0.0 0.1 0.2 0.3 0.4 0.5
-2 -1 0 1 2
Leverage
S t a n d a rd iz e d re s id u a ls
Cook's distance 1
0.5
0.5
1
Residuals vs Leverage
32
12
23
6
STATISTICS
Residual plots:
6 7 8 9 10 11 12
-2 -1 0 1 2
Standardised residual plot of RA
Krafft$RA
S tandardized R esiduals
0.0025 0.0030 0.0035 0.0040 0.0045 0.0050 0.0055
-2 -1 0 1 2
Standardised residual plot of VTINV
Krafft$VTINV
Standardized R esiduals
0.015 0.020 0.025 0.030 0.035 0.040 0.045
-2 -1 0 1 2
Standradised residual plot of DIPINV
Krafft$DIPINV
S ta n d a rd iz e d R e s id u a ls
-450 -400 -350 -300 -250
-2 -1 0 1 2
Standradised residual plot of HEAT
Krafft$HEAT
Standardized Residuals
Answer 6.4.a.
The fitted model is highly significant as the p-value of the F-statistic (115 with 4 and 27 degrees
of freedom) is very small. The value of coefficient of determination (R2) tells that the dependent
variable “KPOINT” is highly (94.46%) described by the independent variables “RA”, “VTINV”,
“DIPINV” and “HEAT” (Park 2011). Apart from “VTINV”, all the variables are significant
predictors of “KPOINT” at 5% level of significance. Only, “DIPINV” is significant at 10% level
of significance but not at 5% level of significance. Hence, overall the linear multiple regression
model has enough validity.
STATISTICS
Residual plots:
6 7 8 9 10 11 12
-2 -1 0 1 2
Standardised residual plot of RA
Krafft$RA
S tandardized R esiduals
0.0025 0.0030 0.0035 0.0040 0.0045 0.0050 0.0055
-2 -1 0 1 2
Standardised residual plot of VTINV
Krafft$VTINV
Standardized R esiduals
0.015 0.020 0.025 0.030 0.035 0.040 0.045
-2 -1 0 1 2
Standradised residual plot of DIPINV
Krafft$DIPINV
S ta n d a rd iz e d R e s id u a ls
-450 -400 -350 -300 -250
-2 -1 0 1 2
Standradised residual plot of HEAT
Krafft$HEAT
Standardized Residuals
Answer 6.4.a.
The fitted model is highly significant as the p-value of the F-statistic (115 with 4 and 27 degrees
of freedom) is very small. The value of coefficient of determination (R2) tells that the dependent
variable “KPOINT” is highly (94.46%) described by the independent variables “RA”, “VTINV”,
“DIPINV” and “HEAT” (Park 2011). Apart from “VTINV”, all the variables are significant
predictors of “KPOINT” at 5% level of significance. Only, “DIPINV” is significant at 10% level
of significance but not at 5% level of significance. Hence, overall the linear multiple regression
model has enough validity.
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
7
STATISTICS
Answer 6.4.b.
The standardized residual plots of the two predictors “Randic Index (RA)” and “Reciprocal of
volume of the tail of the molecule (VTINV)” in the linear regression model refers that the fitting
of these two standardized residual plots is worse than other two predictors that are “Reciprocal of
volume of the tail of the molecule (DIPINV)” and “Heat of formation (HEAT)”. The
standardized residuals of “VTINV” are most scattered followed by the predictor “RA”.
The “Residual vs. Fitted plot”, “Scale-Location plot” and “Residuals vs. Leverage plot” indicate
that the fitting of the multiple linear regression model is overall not bad. The “Normal Q-Q plot”
shows that the fitting is good.
Answer 6.4.c.
As per “Four criteria of correlation coefficient (r), standard deviation (s), F-statistic for the
statistical significance of the model and number of observations to the number of descriptors in
the equation” of Jalali-Heravi and Knouz (2002) for selecting competing regression models, an
elaborated critique is given below:
The greater values of correlation coefficients ( r ) and coefficient of determination ( r2 ) of
the multiple regression model depicts the better fitted linear regression model. The higher
coefficient of determination indicates higher adjusted coefficient of determination
causing over-fitting.
The lower standard deviation causes more compact data that creates more precise
understanding of the response variable.
The greater value of F-statistic out of two regression models helps to choose better
multiple regression model assuming same dependent variable and more or less same
independent variable.
The number of predictors is also an integral part of choosing better regression model.
More of it, the involvement of greater number of predictors generally enhances the
explanatory power of the linear multiple regression model (Sheather 2009).
Therefore, it would be wise to consider mainly three significant variables apart from
“VTINV” to predict “KPOINT”. The researcher may also consider more significant factors in
this linear multiple regression model that might have higher explanatory power to predict the
response variable.
STATISTICS
Answer 6.4.b.
The standardized residual plots of the two predictors “Randic Index (RA)” and “Reciprocal of
volume of the tail of the molecule (VTINV)” in the linear regression model refers that the fitting
of these two standardized residual plots is worse than other two predictors that are “Reciprocal of
volume of the tail of the molecule (DIPINV)” and “Heat of formation (HEAT)”. The
standardized residuals of “VTINV” are most scattered followed by the predictor “RA”.
The “Residual vs. Fitted plot”, “Scale-Location plot” and “Residuals vs. Leverage plot” indicate
that the fitting of the multiple linear regression model is overall not bad. The “Normal Q-Q plot”
shows that the fitting is good.
Answer 6.4.c.
As per “Four criteria of correlation coefficient (r), standard deviation (s), F-statistic for the
statistical significance of the model and number of observations to the number of descriptors in
the equation” of Jalali-Heravi and Knouz (2002) for selecting competing regression models, an
elaborated critique is given below:
The greater values of correlation coefficients ( r ) and coefficient of determination ( r2 ) of
the multiple regression model depicts the better fitted linear regression model. The higher
coefficient of determination indicates higher adjusted coefficient of determination
causing over-fitting.
The lower standard deviation causes more compact data that creates more precise
understanding of the response variable.
The greater value of F-statistic out of two regression models helps to choose better
multiple regression model assuming same dependent variable and more or less same
independent variable.
The number of predictors is also an integral part of choosing better regression model.
More of it, the involvement of greater number of predictors generally enhances the
explanatory power of the linear multiple regression model (Sheather 2009).
Therefore, it would be wise to consider mainly three significant variables apart from
“VTINV” to predict “KPOINT”. The researcher may also consider more significant factors in
this linear multiple regression model that might have higher explanatory power to predict the
response variable.
8
STATISTICS
References:
Park, S.H., 2011. Simple linear regression. In International Encyclopedia of Statistical
Science (pp. 1327-1328). Springer Berlin Heidelberg.
Sheather, S.J., 2009. Diagnostics and Transformations for Multiple Linear Regression. In A
Modern Approach to Regression with R (pp. 151-225). Springer, New York, NY.
Zou, K.H., Tuncali, K. and Silverman, S.G., 2003. Correlation and simple linear
regression. Radiology, 227(3), pp.617-628.
STATISTICS
References:
Park, S.H., 2011. Simple linear regression. In International Encyclopedia of Statistical
Science (pp. 1327-1328). Springer Berlin Heidelberg.
Sheather, S.J., 2009. Diagnostics and Transformations for Multiple Linear Regression. In A
Modern Approach to Regression with R (pp. 151-225). Springer, New York, NY.
Zou, K.H., Tuncali, K. and Silverman, S.G., 2003. Correlation and simple linear
regression. Radiology, 227(3), pp.617-628.
1 out of 9
Related Documents
Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
© 2024 | Zucol Services PVT LTD | All rights reserved.