Case Study: Multiple Linear Regression for Insurance Rates
Verified
Added on 2023/06/11
|7
|769
|83
AI Summary
This case study explores the use of multiple linear regression to predict insurance rates based on five predictor variables. The study includes scatterplot matrix, regression equation, interpretation of coefficients, proportion of variation, residual plots, and assessment of predictor variables.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
Running Header: Case Study1 Case Study Student’s name: Obaid Alshaali Student’s ID: Institution:
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Case study2 1.Draw a scatterplot matrix of the data for the six variables. What do these scatterplots tell you about the relationship among the variables? Figure 1: Scatterplot matrix From the above scatterplot matrix, it can be seen that there is a linear relationship between the variables. Consequently, the scatterplot matrix shows that there is a correlation between the variables though they are not highly correlated. The variables can also be seen to be normally distributed. 2.Does a multiple linear regression equation relating insurance rate to the five predictor variables seem appropriate for these data? Explain our answer.
Case study3 A linear multiple regression is appropriate to the five predictors for these data. The rationale behind this is that there is a linear relationship that can be assumed between the dependent variable and the independent variable. Moreover, multicollinearity is assumed since the independent variables are not too highly correlated. 3.Find the multiple linear regression equation relating the response variable of insurance rate to the five predictor variables. Figure 2: Regression analysis SUMMARY OUTPUT Regression Statistics Multiple R0.86 R Square0.74 Adjusted R Square0.71 Standard Error82.17 Observations50 ANOVA dfSSMSFSignificance F Regression5858326.90171665.3825.430.00 Residual44297055.426751.26 Total491155382.32 CoefficientsStandard Errort StatP-value Intercept82.25123.200.670.51 Pop.Density0.320.074.880.00 Auto theft rate0.150.081.770.08 Deaths/100M miles28.1234.180.820.42 Ave. drive time10.975.002.190.03 Hospital cost/day0.160.081.980.05 From the regression model, the regression equation that can be derived is:
Case study4 Ave.ins.rate = 82.25 + 0.32Pop.Density + 0.15Auto_theft_rate + 28.12Deaths/100M_Miles + 10.97Ave.drive_time + 0.16Hopsital_cost/day 4.Interpret the sample regression coefficients From the regression equation, it is evident that the base average insurance rate is 82.25, holding all factors constant. On the other hand, a unit increase in population density increases the average insurance rate by 0.32 units holding all factors constant. Similarly, auto theft rate, deaths per 100m miles, average drive time, and hospital cost per day increases the average insurance rate by 0.15, 28.12, 10.97, and 0.16 units apiece. All the factors have to be kept constant for each variable. Moreover, all the variables are statistically significant with exception to the intercept, auto theft rate and deaths per 100 miles at p = 0.05. 5.Determine the proportion of variation in the observed insurance rates that can be accounted for by the multiple linear regression equation in the five predictor variables. From the regression statistics, it is seen that the value of the adjusted R squared is 0.71. Thus, 71% of the variability in the regression model is explained by factors in the model that is the five independent variables. However, 29% of the variability is explained by variables or factors which are not in the model.
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Case study5 6.Should all of the predictors remain in the regression equation? From the regression model, it can also be seen that apart from auto theft rate and death per 100m miles, all the other factors are statistically significant at p = 0.05. Thus, not all the variables should be left in the model. The two independent variables or predictor (auto theft rate and death per 100m miles) which are not statistically significant should be removed from the model. 7.Construct residual plots and assess the appropriateness of the multiple linear regression equation. Figure 3: Residual Plots
Case study6 8.Construct residual plots to assess the assumptions of constant conditional standard deviation and normality. Figure 4: Standardized residual plots and normality table
Case study7 9.Do you think these predictor variables do a good job of predicting the response? All the predictors are doing a good job of predicting the response with exception of death per miles. The death per mile variable does not show a line of fit compared to the other variables as seen in question 8 above. Consequently, the variable is not statistically significant as seen in the regression model.