Statistics Homework: Regression Analysis and Autocorrelation Testing

Verified

Added on  2023/01/23

|7
|869
|69
Homework Assignment
AI Summary
This statistics assignment solution provides a comprehensive analysis of two datasets. The first dataset undergoes tests for multicollinearity using correlation coefficients, VIF, and condition index, along with an assessment for heteroscedasticity. Stepwise regression is then applied to determine the best-fit model, considering tolerance values and significance levels. The second dataset is examined for first-order autocorrelation using the Durbin-Watson statistic, with a discussion on the Cochrane-Orcutt Procedure for addressing autocorrelation. The analysis includes interpretations of statistical outputs and recommendations for variable selection and model improvement, offering valuable insights into regression analysis and time series data analysis.
Document Page
STATISTICS
[Document subtitle]
[DATE]
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Test for multicollinearity for data set 1
Interpretation: Multicollinearity would be incurred when the predictor variables would be
highly correlated with each-others. This can be determined through correlation coefficients,
condition index and Variance Inflation Factors (VIF).
It can be seen from the above correlations table that the Pearson correlation coefficients for
maximum variables comes out to be lower than 0.8 which implies that multicollinearity does
not exist. However, for X6 and X1 variable the value is above 0.8 which indicates that
collinearity is very likely to exist.
1
Document Page
According to rule of thumb when the VIF value is higher than 10 then it would be
problematic as it shows that regression coefficients are poorly estimated because of the
presence of multicollinearity.Based on the coefficients output, and collinearity statistics, it
can be seen that the value of VIF for all the variables comes out to be between 1 to 5 which
implies that no multicollinearity symptoms are present in the data set. It is also evident from
the tolerance value as tolerance value for all the variables is not lower than 0.10 and hence,
no multicollinearity is present.
When the condition index is higher than 30, then it would be a serious problem in terms of
multicollinearity. Here, the condition index comes out to be lower than 15 which implies that
multicollinearity symptoms are not present in data.
However, it would be recommended to eliminate the variable X6 because it shows some
collinearity to exist.
Test for heteroscedasticity for data set 1
Interpretation: It is an imperative factor to determine whether there is a difference in residual
variance of observation period to other period of observation. When the regression model is a
good fit, then there is no heteroscedasticity problem. Further, as per the decision rule, when
the significance value is lower than 0.05 then there is a problem of heteroscedasticity whereas
when the value is higher than 0.05 then there would not be a problem of heteroscedasticity.
2
Document Page
It can be said that for half of the variables the significance value is lower than 0.05 and higher
than 0.05 and thus, no conclusion can be drawn based on the coefficients. Further, if the
residual plot shows distinctive fan or cone shape then it would represent heteroscedasticity
because as the fitted value increases then the variance of residuals would also be increased.
No cone shape structure is present in the residual plot and thus, no heteroscedasticity is
existing in the data set 1 as because the increase in the fitted value would not increase the
variance of residuals.
Step Wise Regression for Data set 1
3
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
In the model summary table, it can be in the R square and Significance F change column that
the value of R square is higher than 0.5 and close to highest possible theoretical value 1 and
hence, the model is considered to be a good fit mode. Further, the significance F value comes
out to be lower than 0.05 (Assuming 5% significance level) which implies that all the
variables are significant in the regression model.
4
Document Page
It is evident from the above that tolerance value for in Model I for variable X3 is lower than
0.75 and hence, it would be prudent to delete this predictor variable. Similarly, in Model II,
variable X3 and X6 show lower than 0.75 tolerance value. Also, in Model III and Model IV
the tolerance value is lower than 0.75 for variable X6.
5
Document Page
Considering the significance F value, it would be considered that the best regression model
would be Model III when the independent variables are X2, X5 and X6. However, the
tolerance value is not higher than 0.75 for X6 and thus, it is essential to eliminate this
variable or insert some other predictor variable in place of X6.
Test for first order autocorrelation for data set 2
Durbin- Watson statistic is the relevant test to compute the first order autocorrelation for the
variable of interest when it is a time series data. The Durbin – Watson ranges from 0 to 4.
When the value of statistic comes out to be zero, then it shows more likely positive first order
autocorrelation. When this statistic closes to 2 then it represents no autocorrelation whereas
when it is 4 then the model shows the presence of negative first order autocorrelation.
It can be seen from the above that the value of Durbin- Watson statistic comes out to be 0.296
and hence, there is strong statistically evidence present to conclude that it has high level of
first order autocorrelation.
In order to remove this first order autocorrelation, the key aspect is transformation of the
variable. One of the most versatile method is Cochrane -Orcutt Procedure for transformation
in order to reduce or eliminate the first order autocorrelation.
6
chevron_up_icon
1 out of 7
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]