Statistics and Regression Analysis of Property Data Assignment

Verified

Added on  2022/09/16

|8
|1563
|26
Homework Assignment
AI Summary
This assignment presents a comprehensive statistical analysis of property data, examining various aspects such as assessed values, market values, and assessment-sales ratios. It begins with calculating and comparing mean, median, and range for different valuation methods, highlighting the impact of different approaches. The analysis then delves into measures of dispersion like standard deviation, variance, and coefficient of variation to understand the spread of data. Correlation coefficients are computed to assess relationships between variables like assessed value, number of bedrooms, living area, and lot size, leading to the development of a linear regression model to predict property prices based on living area. Furthermore, a multiple regression analysis is performed, comparing different models to identify the most effective variables for property appraisal, considering factors like adjusted R-squared, standard error, and the significance of various predictors. The document also discusses the benefits and drawbacks of employing multiple regression analysis in property valuation, offering a complete overview of statistical techniques in real estate valuation.
Document Page
Maths
Students Name:
Subject:
Professor’s Name:
Date:
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
First name 2
1.
a).
The calculated Mean, median and Range for ASSESVAL, COSTVALU, and MKTVALU are
as shown below:
Mean Media
n Range
ASSESVA
L
330,2
27
311,40
0
861,5
00
COSTVAL
U
376,4
51
362,12
6
999,0
36
MKTVAL
UE
377,2
83
356,58
2
989,2
64
The assessed value in the previous year (ASSESVAL) with a range of 861,500 has a mean of
330,227 with a median of 311,400. The Market value estimated by applying the cost
approach to value (COSTVALU) with a range of 999,036 has a mean of 376,451 with a
median of 362,126. The Market value estimated by applying the direct comparison approach
value (MRKTVALU) with a range of 989,264 has a mean of 377,283 and a median of
356,506.
The mean and median of last years estimates are lower compared to the mean and median of
this year’s estimates. The reasons behind the differences in the statistics are due to the
different methods used in estimating the market value. The method which estimates the value
which produces the most uniform estimate is the market value by applying the cost approach
(COSTVALU) since it has a wider range than the one applying the direct comparison
approach value.
b). The calculated Mean, median and maximum of ORIGASR, COSTASR, and MRTASR
are as shown below:
Document Page
First name 3
Mean Media
n
Maximu
m
ORIGAS
R 87 87 113
COSTAS
R 100 99 130
MRKTAS
R 100 101 128
The Assessment-sales ratio indicated by last year’s assessed value and the sale price
(ORIGASR) has a mean of 87 with a median and a maximum of 87 and 113 respectively. The
assessment-sales ratio indicated by the cost value and sale price (COSTASR) has a mean of
100 with a median and a maximum value of 99 and 130 respectively. The Market value
estimated by applying the direct comparison approach to value (MRKTASR) has a mean of
100 with a median and a maximum of 101 and 128 respectively.
ASR (Assessment-sales ratio) is the ratio of the assessed value to the sale price of a property.
This year’s assessed values provide the most accurate prediction of the sale price since the
means are equal to 100%. In general, the assessed value of properties needs to compare to
their sale prices with as minimal value as possible. Thus, the assessed value and the sale price
needs to be as closer to 100% as possible.
c). The best measure of central tendency is the range. The range can be used to determine
whether a data has outliers or not and the level of the outliers.
2. The standard deviation is a measure showing how spread out numbers are. Similarly, the
variance shows the squared deviations of a random variable from its mean. On the other
hand, the coefficient of variation measures the relative variability. The three measures are as
shown below:
Standard Deviation Variance Coefficient of Variation
ASSESVAL 87,664 7,685,059,985 0.27
COSTVALU 91,928 8,450,718,546 0.24
MRKTVALU 95,044 9,033,387,522 0.25
Document Page
First name 4
The assessed value in previous year (ASSESVAL) has a standard deviation of 87,664 and a
variance of 7,685,059,985. The Market value estimated by applying the cost approach to
value (COSTVALU) has a standard deviation of 91,928 and a variance of 8,450,718,546. The
Market value estimated by applying the direct comparison approach value (MRKTVALU)
has a standard deviation of 95,044 and a standard deviation of 9,033,387,522. Thus,
ASSESVAL values are closest to the mean compared to COSTVALU and MRKTVALU
which is the furthest. On the other hand, MRTKTVALU seems to have the least variance
with a coefficient of variation of 0.25 compared to COSTVALU and ASSESVALU which
has the most with 0.27.
3. The correlation coefficients of ASSESVAL, BEDROOMS, LIVAREA and LOT_SIZE are
as shown below:
ASSES
VAL
BEDRO
OMS
LIVA
REA
LOT_
SIZE
ASSESV
AL 1
BEDRO
OMS 0.44 1
LIVAR
EA 0.80 0.43 1
LOT_SI
ZE 0.43 0.29 0.34 1
The correlation coefficient is used in finding the strength of a relationship between data. A
strong relationship implies that the correlation coefficient is 1. It is evident that the
correlation coefficient between ASSESVAL and BEDROOMS, LIVAREA & LOT_SIZE is
0.44, 0.8 and 0.43 respectively. From this, the ASSESVAL has a weak but positive
relationship with BEDROOMS & LOT_SIZE while ASSESVAL has a strong and positive
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
First name 5
relationship with LIVAREA. Hence, we can predict the assessed value of the property by
simply knowing the total finished area of house including the basement (LIVAREA).
4. A linear regression model was created using PRICE as the dependent variable and
LIVAREA as the independent variable.
a). The best fit regression equation model is as shown below:
PRICE = 107,060 + 147.51*LIVAREA
b). Using a predicted value for a property with 3,000 square feet of living area, the price of
the property is:
PRICE = 107,060 + 147.51*3,000
PRICE = 549,590
c). The scattergram with PRICE on the Y axis and LIVAREA on the X axis is as shown
below:w
500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 5,000 5,500
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
Scattergram
LIVAREA
PRICE
The approximate sale price of a property with 3,000 square feet of living area is between
500,000 and 600,000.
Document Page
First name 6
d). The answers to (b) and (c) are not different since the line of best-fit regression is almost
the same.
e). The standard error of the estimate (SEE) is 64,722.6. The standard error is used in
measuring the variability of time-series data. Hence, the average distance of data points from
the fitted line is 64,722.
Average of PRICE is 379,800. Thus, SEE as a percentage of the mean is 17%. The
coefficient of variation is as low as 17%, hence the level of dispersion around the mean is
low.
f). The median and the standard deviation of ESTASR together with ORIGASR, COSTASR
and MRKTASR is as shown below:
Median Standard deviation
ORIGASR 87.28 7.32
COSTASR 99.21 9.70
MRKTASR 100.58 9.35
ESTASR 102 15
The SATASR suing only living area as the independent variable is not a good predictive
model for sale price since the standard deviation is large compared to the others. Thus, the
data set further away from the mean on average.
5. From the multiple regression analysis used to prepare a mass appraisal model for the
Edmonton data with four variations, it was seen that:
a). The adjusted R Square of regression 1 is higher than the adjusted R Square of the
regression in question 4. Hence, regression 4 is better since 59.9% of the variability is
explained by the factors in the model compared to 58% of the variability in the regression in
question 4.
Document Page
First name 7
b). Price of the 3 regression models:
Price
Regression 1 458,761.65
Regression 2 454,571.05
Regression 3 467,203.03
The prices derived from the three different regressions are different due to the variability
based on the variables used.
c). The variables that are important in regression three are LIVAREA, LOT_SIZE,
FAMILYRM, and BEDROOMS. The four are important since they are statistically
significant at p-value < 0.05.
The top five variables are as shown below:
Rank Variable Significance
1 LIVAREA 0.00
2 LOT_SIZE 0.00
3 FAMILYRM 0.00
4 BEDROOMS 0.01
5 FIREPLCS 0.18
d). Though there may be two variables that are related, the explanation of the coefficient
impact on the dependent variable will remain as usual where the impact of the WIDTH will
be explained, and the impact of the DEPTH will be explained. However, the significance
level of either of the two will be considered first to determine whether they have a
statistically significant impact on the dependent variable.
e). The R^2, SEE, F-Value and the COV of the four regression models are as shown below:
Model R Square Std. Error of the Estimate F COV
1 .603 62,923.549 194.811 0.1657
2 .613 62,315.672 101.075 0.1641
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
First name 8
3 .631 61,121.412 72.053 0.1609
4 .631 61,206.471 61.630 0.1612
Regression 3 & 4 are the best since they have the highest R squared of 0.631. However,
Model 3 has a lower SEE and COV than model 4 making it more superior to model 4. Hence,
Regression 3 is the best model.
f). The benefits of using multiple regression analysis models for property appraisal is that one
can determine the relative influence of one or more predictor variables to the value of the
criteria. More so, one can identify anomalies or outliers in the data.
The drawbacks of multiple regression boil down to the data being used. An occurrence that
happens when using incomplete data and concluding falsely that a correlation is causation.
chevron_up_icon
1 out of 8
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]