Regression Analysis for House Market Value Estimation

Verified

Added on 2023/05/30

AI Summary

This report presents a regression analysis for estimating the market value of a house based on four independent variables. It includes scatter plots, multiple regression model, coefficient interpretation, significance testing, coefficient of determination, confidence interval, and a comparison of models.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.

STATISTICS
STUDENT ID:
[Pick the date]

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

1) Introduction
The key aim of the given report is to frame an appropriate regression model for the variables
that have been presented. The total variable count for the given dataset is five and all these
variables are in the form of quantitative data allowing the performing of regression analysis.
As the data has been provided for each of the past 15 years, hence the sample size is 15. The
primary objective is to develop a multiple regression model where the market price would be
the dependent variable while the remaining four variables would serve as independent
variables. The measurement scale for the different variables is ratio or interval so as to
facilitate the representation of these variables in the form of a multiple regression model. The
various variables provided seem suitable for estimation of market value of house. Once the
multiple regression model is developed, then suitable changes would be made to develop a
more suitable model and to weed out the independent variables which do not have a
significant relationship with the dependent variable.
2) Scatter Plot
Between every independent variable and the underlying dependent variable, scatter plot needs
to be drawn which is carried out in this section.
The requisite scatter plot between independent variable (Sydney price index) and dependent
variable (market price) is as illustrated below.
Considering that the best fit line shown in the plot above has a positive slope, hence it can be
concluded that the underlying linear relationship between the given two variables is positive.

The deviation of the various scatter points from the line of best fit is also minimal which is
indicative of the fact the underlying magnitude of the correlation between the variables is
high. As a result, it would be fair to conclude that the given two variables (Sydney Price
Index & Market price) have a positive and strong relationship in strength (Flick, 2015).
The requisite scatter plot between independent variable (annual % change) and dependent
variable (market price) is as illustrated below.
Considering that the best fit line shown in the plot above has a positive slope, hence it can be
concluded that the underlying linear relationship between the given two variables is positive.
The deviation of the various scatter points from the line of best fit is quite large which is
indicative of the fact the underlying magnitude of the correlation between the variables is low
to moderate. As a result, it would be fair to conclude that the given two variables (Annual %
change & Market price) have a positive but weak to moderate relationship in strength
(Eriksson and Kovalainen, 2015).
The requisite scatter plot between independent variable (Age of House) and dependent
variable (market price) is as illustrated below.
.

Considering that the best fit line shown in the plot above has a negative slope, hence it can be
concluded that the underlying linear relationship between the given two variables is negative.
The deviation of the various scatter points from the line of best fit is also not very large which
is indicative of the fact the underlying magnitude of the correlation between the variables is
moderately high. As a result, it would be fair to conclude that the given two variables (Age of
house & Market price) have a negative and moderately strong relationship in strength (Hair,
et al., 2015).
The requisite scatter plot between independent variable (Area of House) and dependent
variable (market price) is as illustrated below.

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Considering that the best fit line shown in the plot above has a positive slope, hence it can be
concluded that the underlying linear relationship between the given two variables is positive.
The deviation of the various scatter points from the line of best fit is moderately large which
is indicative of the fact the underlying magnitude of the correlation between the variables is
moderate only. As a result, it would be fair to conclude that the given two variables (Area of
house & Market price) have a positive but moderate relationship in strength (Hillier, 2016).
3) Multiple Regression Model
The suitable multiple regression model has been obtained using Excel and the relevant output
is illustrated below.
4) Equation & Coefficients
The regression equation on the basis of the above output derived from Excel is highlighted
below.
Based on the regression equation indicated above, the intercept value is 548.98. The
respective coefficients of the independent variables are the slope coefficients while the
standard error for the model is 43.8878.

5) Coefficient interpretation and significance testing
The coefficients indicated in the multiple regression model can be interpreted as highlighted
below.
Intercept – This particular coefficient indicates that house market value when the given
independent variables all assume a value of zero which is ofcourse not practical.
Slope coefficient (Sydney Price Index) – The given independent variable has a slope
coefficient of 1.96. The interpretation of this coefficient is that when the given variable tends
to alter by 1 unit, then the house market price would alter by $ 1,960. Considering the
positive value of the coefficient, the movement of both the variables would be directed
towards same direction (Fehr and Grossman, 2013).
Slope coefficient (Annual % change) - The given independent variable has a slope coefficient
of -5.62. The interpretation of this coefficient is that when the given variable tends to alter by
1 unit, then the house market price would alter by $ 5,620. Considering the negative value of
the coefficient, the movement of both the variables would be directed towards opposite
direction (Hastie, Tibshirani and Friedman, 2014).
Slope coefficient (House Area) - The given independent variable has a slope coefficient of
0.52. The interpretation of this coefficient is that when the given variable tends to alter by 1
unit, then the house market price would alter by $ 520. Considering the positive value of the
coefficient, the movement of both the variables would be directed towards same direction
Slope coefficient (House Age) - The given independent variable has a slope coefficient of -
2.49. The interpretation of this coefficient is that when the given variable tends to alter by 1
unit, then the house market price would alter by $ 2,490. Considering the negative value of
the coefficient, the movement of both the variables would be directed towards opposite
direction (Fehr and Grossman, 2013).
The statistical significance of the slope coefficients has been tested below.
Sydney Price Index
H0: βSydney Price Index = 0 i.e. the slope coefficient of the given variable is not significant and
thereby can be taken as zero.

H1: βSydney Price Index ≠ 0 i.e. the slope coefficient of the given variable is significant and thereby
cannot be taken as zero.
For the purpose of this hypothesis testing, the significance level is taken as 5%.
The hypothesis testing would be carried out based on the t statistic. Taking the multiple
regression result into consideration, t statistic is 3.37 and the underlying p value is 0.01. On
comparison of the computed p value with the significance level, the lower values comes out
as p value which warrants H0 rejection based on the given evidence. Hence, H1 would be
accepted (Flick, 2015). The implication is that the slope coefficient is significant for the
independent variable under consideration.
Annual % Change
H0: βAnnual%change = 0 i.e. the slope coefficient of the given variable is not significant and
thereby can be taken as zero.
H1: βAnnual%change ≠ 0 i.e. the slope coefficient of the given variable is significant and thereby
cannot be taken as zero.
For the purpose of this hypothesis testing, the significance level is taken as 5%.
The hypothesis testing would be carried out based on the t statistic. Taking the multiple
regression result into consideration, t statistic is -1.74 and the underlying p value is 0.11. On
comparison of the computed p value with the significance level, the lower values comes out
as significance level which does not warrant H0 rejection based on the given evidence. Hence,
H1 would not be accepted (Medhi, 2016). The implication is that the slope coefficient is not
significant for the independent variable under consideration.
Total Area
H0: βTotalArea = 0 i.e. the slope coefficient of the given variable is not significant and thereby
can be taken as zero.
H1: βTotalArea ≠ 0 i.e. the slope coefficient of the given variable is significant and thereby cannot
be taken as zero.
For the purpose of this hypothesis testing, the significance level is taken as 5%.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

The hypothesis testing would be carried out based on the t statistic. Taking the multiple
regression result into consideration, t statistic is 1.64 and the underlying p value is 0.14. On
comparison of the computed p value with the significance level, the lower values comes out
as significance level which does not warrant H0 rejection based on the given evidence. Hence,
H1 would not be accepted (Hillier, 2016). The implication is that the slope coefficient is not
significant for the independent variable under consideration.
Age of House
H0: βAgeofhouse = 0 i.e. the slope coefficient of the given variable is not significant and thereby
can be taken as zero.
H1: βAgeofhouse ≠ 0 i.e. the slope coefficient of the given variable is significant and thereby
cannot be taken as zero.
For the purpose of this hypothesis testing, the significance level is taken as 5%.
The hypothesis testing would be carried out based on the t statistic. Taking the multiple
regression result into consideration, t statistic is -2.20 and the underlying p value is 0.052. On
comparison of the computed p value with the significance level, the lower values comes out
as significance level which does not warrant H0 rejection based on the given evidence. Hence,
H1 would not be accepted (Hastie, Tibshirani and Friedman, 2014). The implication is that the
slope coefficient is not significant for the independent variable under consideration.
6) Coefficient of Determination
For the multiple regression model that has been developed, the R2 value is 0.7906. This
highlights the fact that the joint variation in the given independent variables can account for
79.06% of the changes that are witnessed with regards to the dependent variable i.e. house
market price. As a result, there is about 21% of the dependent variable variation that is not
accounted for by the given regression model. In such a scenario, it would be fair to conclude
that the regression model is a good fit (Medhi, 2016).
7) Confidence interval
Based on the output of the multiple regression in Excel, the 95% confidence interval has been
identified for the respective parameters which have been highlighted as follows..

The above confidence interval, highlight that the population slope coefficient of the
respective variables would be contained within the boundaries of the interval computed and
this claim has a probability of being 95% correct. For example, the confidence interval with
regards to age would represent that there is 95% likelihood that the slope coefficient of age
based on the population would lie between -5.01 and 0.03 (Flick, 2015).
8) Revised regression model
The revised regression model has been formed with house area as the only independent
variable and house price being the dependent variable. The relevant output is illustrated as
follows.
Based on the above output, the estimated regression line equation is as highlighted below.
9) Models Comparison

For the multiple regression model, the coefficient of determination is 0.7906 while it is only
0.0981 for the revised simple regression model. As a result, it would be fair to conclude that
the revised simple regression model is not a good fit model owing to the poor predictive
capacity of accounting for only 9.81% of the alternation seem in house market prices (Hillier,
2016). Besides, in case of the revised regression model, taking into cognizance the t statistics
associated with slope coefficient along with corresponding p value, it would be fair to
conclude that the significance of the slope coefficient is not established (Eriksson and
Kovalainen, 2015). Therefore, the conclusion can be drawn with regards to the superiority of
the original multiple regression model on account of predictive capacity, better fit and
significance of the model and atleast one slope coefficient.
10) Market Price Estimation
Since the area of the building has been offered and no other information is given, hence price
estimation needs to be carried out on the basis of the revised simple regression model whose
underlying equation is referred as follows.

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

References
Eriksson, P. and Kovalainen, A. (2015) Quantitative methods in business research. 3rd ed.
London: Sage Publications.
Fehr, F. H. and Grossman, G. (2013). An introduction to sets, probability and hypothesis
testing. 3rd ed. Ohio: Heath.
Flick, U. (2015) Introducing research methodology: A beginner's guide to doing a research
project. 4th ed. New York: Sage Publications.
Hair, J. F., Wolfinbarger, M., Money, A. H., Samouel, P., and Page, M. J. (2015) Essentials
of business research methods. 2nd ed. New York: Routledge.
Hastie, T., Tibshirani, R. and Friedman, J. (2014) The Elements of Statistical Learning. 4th
ed. New York: Springer Publications.
Hillier, F. (2016) Introduction to Operations Research. 6th ed. New York: McGraw Hill
Publications.
Medhi, J. (2016) Statistical Methods: An Introductory Text. 4th ed. Sydney: New Age
International.