University of Wollongong: Regression Model for Insurance Premium

Verified

Added on  2023/04/11

|2
|429
|311
Report
AI Summary
This report focuses on developing a regression model for determining motor driver's insurance premiums, with Return on Assets (ROA) as a proxy for profitability. The model incorporates five independent variables: company leverage, liquidity, company size, the volume of capital, and underwriting risk. The report addresses the construction of a multiple linear regression model, detailing its deterministic and probabilistic components, and the importance of fixed predictors. It emphasizes the use of 1000-variable datasets for training, assessing underfitting and overfitting, and determining the generalization error. The application of machine learning concepts, including data classification and regression model selection, is also discussed. The report highlights the use of tools like SPSS, Microsoft Excel, and Python for data analysis and regression modeling, referencing key literature on multiple linear regression and model-based prediction.
Document Page
Regression model for Premium payable by customers who purchase motor driver’s
insurance.
When coming up with the regression model, the response variable in the data set is
Return on Assets (ROA) which is used as a proxy for the profitability of motor driver’s insurance
companies. There are five independent variables in the data set, which are: Company leverage,
liquidity, company size, the volume of capital and underwriting risk (Politis, 2015).
The issues that need to be considered in building the Regression model
A multiple linear regression model is a probabilistic model that includes more than one
independent variable. The general multiple linear regression model is of the form,
yi =β0 + β1xi1 + β2xi2 +…βkxik + Ɛi ; Ɛi ~ N(0, Ụ2) Where, β0, β1, β2…βk are the regression
coefficients of predictors xi1, xi2,..xik; where i = 1, 2…n and Ɛi is the error term.
Thus, we have n observations on y and the associated x variables in the above equation.
The regression coefficient of a predictor quantifies the amount of linear trend in y. It gives the
amount of change in y corresponding to one unit change in a predictor while all other predictors
are held fixed at some specified levels.
Considerations for concepts
The multiple linear regression model as given in equation above has got two components:
the deterministic component and the probabilistic component. yi =β0 + β1xi1 + β2xi2 +…βkxik is
the deterministic component of the model, and, Ɛi is the probabilistic component. In the multiple
linear regression model, the predictors are strictly assumed to be fixed, i.e. x1, x2…xk are fixed
variables (either discrete or continuous) that are controlled by the experimenter while y is a
continuous random variable (Olive, 2017).
The number of variables chosen will be five as indicated above. 1000 variable data sets
will be used as examples in order to determine the data training required, under-fitting and over-
fitting parameters and will also determine the generalization error. The model will have a
capacity to determine a preset number of outcomes. The applicable aspects of machine learning
that we have studied will be helpful when it comes to training of corpus, classification of data
and determining the regression model to use. It is also critical to note that machine learning
explains a number of ways that data can be analyzed including, using SPSS statistical software,
using Microsoft Excel or using Python programming language to do the regressions.
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
References
Olive, D. J. (2017). Multiple Linear Regression. Linear Regression, 17-83. doi:10.1007/978-3-
319-55252-1_2
Politis, D. N. (2015). Model-Based Prediction in Regression. Model-Free Prediction and
Regression, 33-56. doi:10.1007/978-3-319-21347-7_3
chevron_up_icon
1 out of 2
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]