This research report discusses the patterns of financial benefit paid to BUPA, an Australian health insurance company. It includes preliminary statistics, model estimation, dummy variables, quadratic specification, and model design.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
RESEARCH REPORT ON ECONOMETRICS Executive summary In this research report immediately after the introduction, we are going to discuss the preliminary statistics which comprises of the descriptive statistics, scatter plot and results of tables. Moreover we will discuss the model estimation, dummy variables, quadratic specification, model specification and last but least we will also discuss the model design. Introduction This is a research report which will discuss the given 38 variables and 6786 observations to analyze the patterns of financial benefit that is paid to BUPA which is an Australian health insurance company. The attributes not only describes the features of the insurance products but also the characteristics of customer. The main aim of this research is to predict the total benefit and hospital benefits. 1 Preliminary statistics a) VariableObsMeanStd. Dev.MinMax tenure_years6,78511.533789.438747042.4 total_bene~t6,7864581.2199144.589095971.84 contributo~e6,78549.2462816.048818104 daily_prem~m6,78512.619116.615719.7440.66 lives_on_m~p6,7852.3913041.41671718 calls_to_b~a6,7852.0467211.703397124 Figure 1 b) Figure 1 above shows that the variable total_ benefit has 6786 observations whereas the rest of the variables each have 6785 observations. The mean of the variable calls_to_bupa is relatively close to the mean of the variable lives_on_membership. The mean of the attribute total _ benefit
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
is higher than that of the other variables included in figure 1 above. This variable total_benefit still has the greatest standard deviation of approximately 9144.589. c) 020000400006000080000100000 total_benefit 0510152025 calls_to_bupa Figure 2 The highest data point for the variable calls_to_bupa is approximately 25 while that of the variable total_benefit is a round 100,000. Most of the data points lie between 0 and 10 of the variable calls_to_bupa hence there is a correlation between them. 2) Model estimation
a) total_bene~tCoef.Std. Err.tP>t[95% Conf.Interval] tenure_years32.1198114.271552.250.0244.143160.09653 contributo~e94.044818.56188610.980.00077.26082110.8288 daily_prem~m254.657820.1726412.620.000215.1131294.2025 lives_on_m~p-334.519289.91961-3.720.000-510.7898-158.2485 calls_to_b~a541.189362.412098.670.000418.842663.5366 _cons-3941.208444.4015-8.870.000-4812.374-3070.042 Figure 3 b) The model is given as follows; Total_benefit=- 3941.21+32.12*tenure_years+94.04*contributor_age+254.66*daily_premium- 334.52*lives_on_membership+541.19*calls_to_bupa If daily premium decreases with $1.5, the total benefit will increase by $ 1.5 (254.67) or $ 382,005 and if lives_on_membership increases by 2 unit then the total benefit also decreases by (-334.52) *2 or $66,904. c) Suppose this is the correct model specification, then the tenure year has a positive effect on total benefit because its p- value is 0.024 which is less than 0.05 and again the t- statistics is 2.25 which is greater than 0.05. 3) Dummy variables a)
total_bene~tCoef.Std. Err.tP>t[95% Conf.Interval] _Iproduct_~2-4625.837514.8882-8.980.000-5635.18-3616.494 _Iproduct_~3-651.5264383.4755-1.700.089-1403.259100.206 _Iproduct_~40(omitted) package10(omitted) package20(omitted) package30(omitted) package4-4463.6421816.715-2.460.014-8024.975-902.3092 state1-927.0557790.3415-1.170.241-2476.373622.2619 state2-2303.6041422.861-1.620.105-5092.859485.651 state3391.729800.46830.490.625-1177.441960.898 state4-452.4229820.2857-0.550.581-2060.4411155.595 state50(omitted) state6-638.4669798.4277-0.800.424-2203.636926.7023 state7-1800.408879.0925-2.050.041-3523.706-77.11072 _cons6108.465844.42567.230.0004453.1257763.804 Figure 4 b) From figure 4 above the estimated coefficient on “HOSPITAL AND EXTRAS” is -651.53. Its p-value is 0.089 which is greater than 0.05 and its t statistics is -1.70 which is less than 0.05. Therefore this dummy variable “HOSPITAL AND EXTRAS” is not fit to be used for predicting the variable total_benefit. 4) Quadratic specification a)
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
total_bene~tCoef.Std. Err.tP>t[95% Conf.Interval] contributo~e-17.1678741.31587-0.420.678-98.1599463.8242 contributo~q1.499754.39072893.840.000.7338032.265705 _cons1403.9241020.3531.380.169-596.28713404.135 Figure 5 b)Total_benefit=1403.924-17.17*contributor_age+1.50*contributor-age-sq From the model written above we observe that the point where the marginal effect of the variable contributor_age on total benefit is positive on all the values because contributor_age_sq is the square of contributor_age. Therefore, since the value of the variable contributor_age_sq being greater than that of the variable contributor_age then the turning point would always be positive. 5) Model specification The R-square or the coefficient of determination describes the number of points that lies within the path of regression. From question 4 above the R-square is 0.06 or 6% and this tell us that the model is fairly fit for the data given. This coefficient of determination measures the degree that the dependent variable (total_benefit) is predicted by the independent variables (contributor_age and contributor_age_sq). Therefore in this case R-square of 6% means that 6% of the dependent variable is predicted by the independent variable. Some additional explanatory variables which could be included in the model to improve the goodness of fit of the model are; State where membership is held Gender of main policy holder Description of family structure (plus refers to dependent aged 21-25 on parent policy)
The implications of omitting potentially relevant explanatory variables is that they may result in a model which is perfect hence can’t be used for research where challenges need to be faced and then solved. 6) Model design a) VariableObsMeanStd. Dev.MinMax total_be~tal6,7862937.9288080.162082794 _Imembersh~26,785.0084009.091277301 _Imembersh~36,785.2187178.413407201 _Imembersh~46,785.13972.346721801 _Imembersh~56,785.0206338.142165301 _Imembersh~66,785.2374355.425542701 _Imembersh~76,785.0645542.245755601 family16,785.1585851.365315701 family26,785.3995578.489843601 family36,785.0343405.182115601 family46,785.3709654.483098801 family56,785.0344878.18249201 family66,785.0020634.045380801 Figure 6 b) total_be~talCoef.Std. Err.TP>t[95% Conf.Interval] family11819.096582.57973.120.002677.05642961.135
family2123.1691550.24290.220.823-955.47991201.818 family30(omitted) family4329.9524551.82910.600.550-751.8061411.711 family532.18354745.65940.040.966-1429.5431493.91 family62770.9882215.2591.250.211-1571.6167113.592 _Imembersh~2-1059.6951080.821-0.980.327-3178.4431059.054 _Imembersh~3880.549273.45283.220.001344.49551416.602 _Imembersh~4-75.78216315.5442-0.240.810-694.3479542.7836 _Imembersh~5830.1518702.85531.180.238-547.66552207.969 _Imembersh~6330.9561266.95951.240.215-192.3685854.2806 _Imembersh~7-866.4884423.5197-2.050.041-1696.72-36.25676 _cons2258.562552.25434.090.0001175.973341.154 Figure 7 c) In figure 7 above we observe that the p-value of the dummy variable which are less than 0.05 are family 1, membership_state 3and membership_state 7 and the dummy variable whose p- value is greater than0.05 include family 2,family 4, family 5, family 6, membership_state 2 and so on. Therefore when the p-value of the dummy variable are less than 0.05 then we conclude that the dummy variable are statistically significance but when their p-value is greater than 0.05 then they are statistically insignificant. Again using the t statistics which is mainly used to test hypothesis we realize that when t statistics is less than 0.05 then it is statistically insignificant and whenever it’s greater than 0.05 then it’s statistically significant. d) I choose to include the variables family 1,family 2, membership_state 7and many others so that I reduce the risk of missing variables biasness which results in violation of the conditional mean independence assumption. The irrelevant variable does not result in risk bias of the estimated coefficients. e) The assumption “homoscedasticity” in the baseline linear regression model which estimates the model with heteroskedasticity-robust standard errors.
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
The assumption “no perfect collinearity” here I do not include all the dummy variables for family type so that this assumption is not violated. The assumption “linear in parameters” which implies that we have to choose the correct specification of the explanatory variables so that we deal with nonlinearities. f) The variables family 1, membership_state 3and membership_state 7 whose p-value is less than 0.05 are chosen as the key drivers of total_benefit_paid_hospital.