RMIT Basic Econometrics Research Report on BUPA Financial Benefits

Verified

Added on 2023/03/31

AI Summary

This research report analyzes the financial benefits paid by BUPA, an Australian health insurance company, using econometric methods. The report utilizes a dataset of 38 variables and 6786 observations to predict total and hospital benefits based on insurance product features and customer characteristics. Preliminary statistics, including descriptive statistics and scatter plots, are presented. The report discusses model estimation, the use of dummy variables, quadratic specifications, and model specification considerations. Key findings include the impact of various factors such as tenure years, contributor age, and daily premium on total benefits. The analysis also explores the statistical significance of different dummy variables representing product types and family structures. The report concludes with a discussion of model design, addressing assumptions like homoscedasticity and multicollinearity, and identifies key drivers of total hospital benefits paid.

RESEARCH REPORT ON ECONOMETRICS
Executive summary
In this research report immediately after the introduction, we are going to discuss the preliminary
statistics which comprises of the descriptive statistics, scatter plot and results of tables. Moreover
we will discuss the model estimation, dummy variables, quadratic specification, model
specification and last but least we will also discuss the model design.
Introduction
This is a research report which will discuss the given 38 variables and 6786 observations to
analyze the patterns of financial benefit that is paid to BUPA which is an Australian health
insurance company. The attributes not only describes the features of the insurance products but
also the characteristics of customer. The main aim of this research is to predict the total benefit
and hospital benefits.
1 Preliminary statistics
a)
Variable Obs Mean Std. Dev. Min Max
tenure_years 6,785 11.53378 9.438747 0 42.4
total_bene~t 6,786 4581.219 9144.589 0 95971.84
contributo~e 6,785 49.24628 16.0488 18 104
daily_prem~m 6,785 12.61911 6.615719 .74 40.66
lives_on_m~p 6,785 2.391304 1.416717 1 8
calls_to_b~a 6,785 2.046721 1.703397 1 24
Figure 1
b) Figure 1 above shows that the variable total_ benefit has 6786 observations whereas the rest of
the variables each have 6785 observations. The mean of the variable calls_to_bupa is relatively
close to the mean of the variable lives_on_membership. The mean of the attribute total _ benefit

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

is higher than that of the other variables included in figure 1 above. This variable total_benefit
still has the greatest standard deviation of approximately 9144.589.
c)
0 20000 40000 60000 80000 100000
total_benefit
0 5 10 15 20 25
calls_to_bupa
Figure 2
The highest data point for the variable calls_to_bupa is approximately 25 while that of the
variable total_benefit is a round 100,000. Most of the data points lie between 0 and 10 of the
variable calls_to_bupa hence there is a correlation between them.
2) Model estimation

a)
total_bene~t Coef. Std. Err. t P>t [95% Conf. Interval]
tenure_years 32.11981 14.27155 2.25 0.024 4.1431 60.09653
contributo~e 94.04481 8.561886 10.98 0.000 77.26082 110.8288
daily_prem~m 254.6578 20.17264 12.62 0.000 215.1131 294.2025
lives_on_m~p -334.5192 89.91961 -3.72 0.000 -510.7898 -158.2485
calls_to_b~a 541.1893 62.41209 8.67 0.000 418.842 663.5366
_cons -3941.208 444.4015 -8.87 0.000 -4812.374 -3070.042
Figure 3
b) The model is given as follows;
Total_benefit=-
3941.21+32.12*tenure_years+94.04*contributor_age+254.66*daily_premium-
334.52*lives_on_membership+541.19*calls_to_bupa
If daily premium decreases with $1.5, the total benefit will increase by $ 1.5 (254.67) or $
382,005 and if lives_on_membership increases by 2 unit then the total benefit also decreases by
(-334.52) *2 or $66,904.
c) Suppose this is the correct model specification, then the tenure year has a positive effect on
total benefit because its p- value is 0.024 which is less than 0.05 and again the t- statistics is 2.25
which is greater than 0.05.
3) Dummy variables
a)

total_bene~t Coef. Std. Err. t P>t [95% Conf. Interval]
_Iproduct_~2 -4625.837 514.8882 -8.98 0.000 -5635.18 -3616.494
_Iproduct_~3 -651.5264 383.4755 -1.70 0.089 -1403.259 100.206
_Iproduct_~4 0 (omitted)
package1 0 (omitted)
package2 0 (omitted)
package3 0 (omitted)
package4 -4463.642 1816.715 -2.46 0.014 -8024.975 -902.3092
state1 -927.0557 790.3415 -1.17 0.241 -2476.373 622.2619
state2 -2303.604 1422.861 -1.62 0.105 -5092.859 485.651
state3 391.729 800.4683 0.49 0.625 -1177.44 1960.898
state4 -452.4229 820.2857 -0.55 0.581 -2060.441 1155.595
state5 0 (omitted)
state6 -638.4669 798.4277 -0.80 0.424 -2203.636 926.7023
state7 -1800.408 879.0925 -2.05 0.041 -3523.706 -77.11072
_cons 6108.465 844.4256 7.23 0.000 4453.125 7763.804
Figure 4
b) From figure 4 above the estimated coefficient on “HOSPITAL AND EXTRAS” is -651.53. Its
p-value is 0.089 which is greater than 0.05 and its t statistics is -1.70 which is less than 0.05.
Therefore this dummy variable “HOSPITAL AND EXTRAS” is not fit to be used for predicting
the variable total_benefit.
4) Quadratic specification
a)

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

total_bene~t Coef. Std. Err. t P>t [95% Conf. Interval]
contributo~e -17.16787 41.31587 -0.42 0.678 -98.15994 63.8242
contributo~q 1.499754 .3907289 3.84 0.000 .733803 2.265705
_cons 1403.924 1020.353 1.38 0.169 -596.2871 3404.135
Figure 5
b) Total_benefit=1403.924-17.17*contributor_age+1.50*contributor-age-sq
From the model written above we observe that the point where the marginal effect of the variable
contributor_age on total benefit is positive on all the values because contributor_age_sq is the
square of contributor_age. Therefore, since the value of the variable contributor_age_sq being
greater than that of the variable contributor_age then the turning point would always be positive.
5) Model specification
The R-square or the coefficient of determination describes the number of points that lies within
the path of regression. From question 4 above the R-square is 0.06 or 6% and this tell us that the
model is fairly fit for the data given. This coefficient of determination measures the degree that
the dependent variable (total_benefit) is predicted by the independent variables (contributor_age
and contributor_age_sq). Therefore in this case R-square of 6% means that 6% of the dependent
variable is predicted by the independent variable. Some additional explanatory variables which
could be included in the model to improve the goodness of fit of the model are;
 State where membership is held
 Gender of main policy holder
 Description of family structure (plus refers to dependent aged 21-25 on parent
policy)

The implications of omitting potentially relevant explanatory variables is that they may result in
a model which is perfect hence can’t be used for research where challenges need to be faced and
then solved.
6) Model design
a)
Variable Obs Mean Std. Dev. Min Max
total_be~tal 6,786 2937.928 8080.162 0 82794
_Imembersh~2 6,785 .0084009 .0912773 0 1
_Imembersh~3 6,785 .2187178 .4134072 0 1
_Imembersh~4 6,785 .13972 .3467218 0 1
_Imembersh~5 6,785 .0206338 .1421653 0 1
_Imembersh~6 6,785 .2374355 .4255427 0 1
_Imembersh~7 6,785 .0645542 .2457556 0 1
family1 6,785 .1585851 .3653157 0 1
family2 6,785 .3995578 .4898436 0 1
family3 6,785 .0343405 .1821156 0 1
family4 6,785 .3709654 .4830988 0 1
family5 6,785 .0344878 .182492 0 1
family6 6,785 .0020634 .0453808 0 1
Figure 6
b)
total_be~tal Coef. Std. Err. T P>t [95% Conf. Interval]
family1 1819.096 582.5797 3.12 0.002 677.0564 2961.135

family2 123.1691 550.2429 0.22 0.823 -955.4799 1201.818
family3 0 (omitted)
family4 329.9524 551.8291 0.60 0.550 -751.806 1411.711
family5 32.18354 745.6594 0.04 0.966 -1429.543 1493.91
family6 2770.988 2215.259 1.25 0.211 -1571.616 7113.592
_Imembersh~2 -1059.695 1080.821 -0.98 0.327 -3178.443 1059.054
_Imembersh~3 880.549 273.4528 3.22 0.001 344.4955 1416.602
_Imembersh~4 -75.78216 315.5442 -0.24 0.810 -694.3479 542.7836
_Imembersh~5 830.1518 702.8553 1.18 0.238 -547.6655 2207.969
_Imembersh~6 330.9561 266.9595 1.24 0.215 -192.3685 854.2806
_Imembersh~7 -866.4884 423.5197 -2.05 0.041 -1696.72 -36.25676
_cons 2258.562 552.2543 4.09 0.000 1175.97 3341.154
Figure 7
c) In figure 7 above we observe that the p-value of the dummy variable which are less than 0.05
are family 1, membership_state 3and membership_state 7 and the dummy variable whose p-
value is greater than0.05 include family 2,family 4, family 5, family 6, membership_state 2 and
so on. Therefore when the p-value of the dummy variable are less than 0.05 then we conclude
that the dummy variable are statistically significance but when their p-value is greater than 0.05
then they are statistically insignificant. Again using the t statistics which is mainly used to test
hypothesis we realize that when t statistics is less than 0.05 then it is statistically insignificant
and whenever it’s greater than 0.05 then it’s statistically significant.
d) I choose to include the variables family 1,family 2, membership_state 7and many others so
that I reduce the risk of missing variables biasness which results in violation of the conditional
mean independence assumption. The irrelevant variable does not result in risk bias of the
estimated coefficients.
e) The assumption “homoscedasticity” in the baseline linear regression model which estimates
the model with heteroskedasticity-robust standard errors.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

The assumption “no perfect collinearity” here I do not include all the dummy variables for
family type so that this assumption is not violated.
The assumption “linear in parameters” which implies that we have to choose the correct
specification of the explanatory variables so that we deal with nonlinearities.
f) The variables family 1, membership_state 3and membership_state 7 whose p-value is less than
0.05 are chosen as the key drivers of total_benefit_paid_hospital.