Statistics Assignment: Regression Analysis of Property Values

Verified

Added on  2023/01/11

|11
|1898
|65
Homework Assignment
AI Summary
This statistics assignment solution presents an analysis of property values using regression models. The assignment explores linear and multiple regression techniques, examining the relationship between property values and factors like lot size, gross area, age of the house, and remodeling. The solution includes scatter plots, regression outputs, residual analysis, and interpretations of coefficients and R-squared values. It tests for the significance of predictor variables and builds predictive models. The assignment progresses from simple linear regression to multiple regression, incorporating more variables and evaluating their impact on the total property value. The final section uses a developed regression model to predict property values based on different property characteristics, with the aim of identifying significant predictors and building an effective predictive model. The analysis highlights the importance of statistical methods in understanding and predicting real estate values.
Document Page
Statistics
Statistics
Student name:
Tutor name:
1 | P a g e
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Statistics
Question 1
0 5000 10000 15000 20000 25000
-
50.0
100.0
150.0
200.0
250.0
300.0
350.0
400.0
450.0
500.0 f(x) = 0.00968278151053754 x + 302.006045714289
R² = 0.110518962373954
Scatterplot
LOT SQ FT
Total value in 000s
Figure 1
As can be observed from the scatter diagram above, there is a linear relationship between the
total value in thousands and lot sq ft. This has been indicated by the line of best fit on the graph.
Results of simple linear regression between lot sq. ft and total value of property and size of
parcel
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.33244392
R Square 0.11051896
Adjusted R Square 0.11030918
Standard Error 49.582376
Observations 4242
ANOVA
df SS MS F Significance F
Regression 1 1295151.669 1295151.67 526.8245 5.439E-110
Residual 4240 10423666.9 2458.41201
Total 4241 11718818.57
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 302.006046 2.515641449 120.051308 0 297.074071 306.9380202
LOT SQFT 0.00968278 0.000421859 22.9526577 5.4E-110 0.00885572 0.010509846
2 | P a g e
Document Page
Statistics
Table 1
Regression model is as below
Total value ( 000 s )=0.0097 ( LOT SQFT ) +302
The y-intercept = 302. This means that when every factor is held constant (LOT SQ FT), the
value of the property will be 302 thousand dollars.
The coefficient of determination R2 = 0.11. This means that 11% of the variation that occurs in
the dependent variable “total value” is explained by the independent variable “LOT SQ FT”.
Residual analysis
0 5000 10000 15000 20000 25000
-200
-150
-100
-50
0
50
100
150
LOT SQFT Residual Plot
LOT SQFT
Residuals
Figure 1
It can be observed that the distribution of the points does not depict a curvature pattern meaning
that the assumption of independence had been fulfilled.
The assumption of constant variance (homoscedasticity) was violated since the data points
seemed to be unequally spread along the residual line.
3 | P a g e
Document Page
Statistics
Question 2
Scatterplot of total value and gross area
0 1000 2000 3000 4000 5000 6000
-
100.0
200.0
300.0
400.0
500.0 f(x) = 0.0537473811904808 x + 215.002923176657
R² = 0.296361513005341
Scatterplot of total value & gross
area
Gross area
Total value
Figure 2
As can be observed from the scatter diagram above, there is a linear relationship between the
total value in thousands and gross area. This has been indicated by the line of best fit on the
graph.
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.544391
R Square 0.296362
Adjusted R Square 0.296196
Standard Error 44.09951
Observations 4242
ANOVA
df SS MS F Significance F
Regression 1 3473007 3473007 1785.822 0
Residual 4240 8245812 1944.767
Total 4241 11718819
CoefficientsStandard Error t Stat P-value Lower 95%Upper 95%
Intercept 215.0029 3.428617 62.70835 0 208.281 221.7248
GROSS AREA 0.053747 0.001272 42.25898 0 0.051254 0.056241
4 | P a g e
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Statistics
Table 2
Regression model is as below
Total value ( 000 s )=0.05 ( gross area ) +215
The y-intercept = 215. This means that when every factor is held constant (gross area), the value
of the property will be 215 thousand dollars.
The coefficient of determination R2 = 0.3. This means that 30% of the variation that occurs in the
dependent variable “total value” is explained by the independent variable “gross area”.
Residual analysis
0 1000 2000 3000 4000 5000 6000
-200
-150
-100
-50
0
50
100
150
GROSS AREA Residual Plot
GROSS AREA
Residuals
Figure 3
It can be observed that the distribution of the points does not depict a curvature pattern meaning
that the assumption of independence had been fulfilled.
The assumption of constant variance (homoscedasticity) was violated since the data points
seemed to be unequally spread along the residual line.
5 | P a g e
Document Page
Statistics
QUESTION 3
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.601665
R Square 0.362001
Adjusted R Square 0.361549
Standard Error 42.00215
Observations 4242
ANOVA
df SS MS F Significance F
Regression 3 4242222 1414074 801.5473 0
Residual 4238 7476597 1764.18
Total 4241 11718819
CoefficientsStandard Error t Stat P-value Lower 95% Upper 95%
Intercept 176.7974 4.082895 43.30197 0 168.7927852 184.802
LOT SQFT 0.007553 0.000363 20.82905 8.14E-92 0.00684186 0.008264
AGE OF HOUSE 0.094546 0.030454 3.104497 0.001919 0.034839042 0.154252
GROSS AREA 0.049052 0.001258 38.99011 1.9E-284 0.046585691 0.051519
Table 3
a. The multiple regression equation is;
Total value ( 000 s )=0.008 ( lot sqft ) +0.09 ( age of house )+ 0.05 ( gross area ) +176.8
b. Residual analysis
6 | P a g e
Document Page
Statistics
-200
-100
0
100
200
0 5000 10000 15000 20000 25000
Residuals
LOT SQFT
LOT SQFT Residual Plot
-200
-150
-100
-50
0
50
100
150
0 50 100 150 200 250
Residuals
AGE OF HOUSE
AGE OF HOUSE Residual Plot
-200
-100
0
100
200
0 1000 2000 3000 4000 5000 6000
Residuals
GROSS AREA
GROSS AREA Residual Plot
Figure 4
It can be observed that the distribution of the points do not depict a curvature pattern in all of the
three independent variables meaning that the assumption of independence had been fulfilled.
However, the assumption of constant variance (homoscedasticity) was violated since the data
points in gross area and age of the house seemed to be unequally spread along the residual line.
c. Test for relationship between the independent and dependent variables
ANOVA
df SS MS F Significance F
Regression 3 4242222 1414074 801.5473 0
Residual 4238 7476597 1764.18
Total 4241 11718819
Table 4
Since the p-value computed (0.00) is less than the level of significance (0.05) we can conclude
that there is a significant relationship between predictor variables and the response variable.
d. The p-value = 0.00. This indicates that the model is significant.
e. The adjusted R2 = 0.36
f. All the variables significance
7 | P a g e
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Statistics
P-value
Intercept 0
LOT SQFT 8.14E-92
AGE OF
HOUSE
0.00191
9
GROSS AREA 1.9E-284
Table 5
All the predictor variables were significant since all their p-values (0.00) were less than the level
of significance (0.05).
g. The most significant predictor was gross area since it had the least p-value.
QUESTION FOUR
Regression analysis results
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.605418
R Square 0.366531
Adjusted R Square 0.365933
Standard Error 41.8577
Observations 4242
ANOVA
df SS MS F Significance F
Regression 4 4295312 1073828 612.8921 0
Residual 4237 7423507 1752.067
Total 4241 11718819
CoefficientsStandard Error t Stat P-value Lower 95%Upper 95%
Intercept 171.8965 4.165123 41.27044 0 163.7306 180.0623
LOT SQFT 0.007602 0.000361 21.0302 1.78E-93 0.006893 0.00831
AGE OF HOUSE 0.080937 0.03045 2.658021 0.00789 0.021239 0.140636
GROSS AREA 0.048451 0.001258 38.49954 2.4E-278 0.045984 0.050918
REMODEL 5.585519 1.014691 5.504649 3.92E-08 3.596193 7.574846
Table 6
a. The multiple regression equation is as below;
Total value ( 000 s )=0.008 ( lot sqft ) +0.08 ( age of house )+ 0.05 ( gross area ) +5.5 ( remodel ) +176.9
8 | P a g e
Document Page
Statistics
b. Residual analysis
-200
-100
0
100
200
0 5000 10000 15000 20000 25000
Residuals
LOT SQFT
LOT SQFT Residual Plot
-200
-100
0
100
200
0 50 100 150 200 250
Residuals
AGE OF HOUSE
AGE OF HOUSE Residual
Plot
-200
-100
0
100
200
0 1000 2000 3000 4000 5000 6000
Residuals
GROSS AREA
GROSS AREA Residual Plot
-200
-100
0
100
200
0 1 2 3 4
Residuals
REMODEL
REMODEL Residual Plot
It can be observed that the distribution of the points do not depict a curvature pattern in all of the
three independent variables meaning that the assumption of independence had been fulfilled.
However, the assumption of constant variance (homoscedasticity) was violated since the data
points in gross area and age of the house seemed to be unequally spread from the residual line.
c. The significance F is equal to zero. This means that there is a significant relationship
between the predictor variables and the response variable.
d. The p-value = 0.00. This means that the model is a significant predictor of the dependent
variable.
e. The adjusted R2 = 0.37
f. All the variables are significant.
P-value
Intercept 0
LOT SQFT 1.78158E-93
AGE OF HOUSE 0.007889632
GROSS AREA 2.4409E-278
REMODEL 3.91772E-08
Table 7
9 | P a g e
Document Page
Statistics
From table 7 above, the variables with p-values less than the level of significance 0.05 are
considered significant predictors. In this case all the variables are significant.
g. Predicting the total value
The regression model to be used;
Total value ( 000 s )=0.008 ( lot sqft ) +0.08 ( age of house )+ 0.05 ( gross area ) +5.5 ( remodel ) +176.9
TOTAL
VALUE
LOT
SQFT
AGE OF
HOUSE
GROSS
AREA
REMODE
L
510.448 18256 25 3600 1
534.06 12000 2 5000 2
437.36 8000 62 3500 3
0.008 0.08 0.05 5.5
Table 8
The predicted total values are as shown in table 8 above.
QUESTION 5
Coefficient
s
Standard
Error t Stat P-value Lower 95%
Upper
95%
Intercept 94.93725 6.04021817
15.7175
2 3.80484E-54 83.0952525
106.779
3
LOT SQFT 0.008094
0.00028302
1
28.5969
7 1.3739E-164 0.00753866
0.00864
8
AGE OF HOUSE 0.011696
0.02709577
7
0.43163
8 0.666026543
-
0.04142638
0.06481
8
GROSS AREA 0.025864
0.00166133
2
15.5680
2 3.50197E-53
0.02260657
3
0.02912
1
LIVING AREA 0.02484
0.00312867
3
7.93941
6 2.58215E-15
0.01870599
3
0.03097
4
FLOORS 39.49879
1.57487962
7
25.0805
2 1.5527E-129
36.4112011
8
42.5863
8
ROOMS 2.615862
0.61912074
4
4.22512
4 2.43841E-05 1.40206037
3.82966
4
BEDROOMS -0.39836
0.95248390
8
-
0.41824 0.675794375
-
2.26573354
1.46900
4
FULL BATH 8.930323
1.40632302
6
6.35012
2 2.37822E-10
6.17319095
3
11.6874
5
HALF BATH 11.46528
1.13199658
6
10.1283
7 7.71846E-24
9.24597073
4
13.6845
9
KITCHEN -4.26085
4.47964131
3
-
0.95116 0.341577883
-
13.0433016
4.52159
7
FIREPLACE 15.71381
1.00008313
4 15.7125 4.10025E-54 13.7531226 17.6745
10 | P a g e
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Statistics
REMODEL 5.019054
0.79019802
4
6.35164
1 2.35507E-10
3.46985107
4
6.56825
7
Table 9
From the p-values above, it can be observed that age of house, bedrooms and kitchen are not
significant and therefore can be removed from the model.
So model will look as below;
Total value ( 000 s )=0.008 ( lot sqft ) +0.02 ( living area )+ 0.02 5 ( gross area )+ 5.01 ( remodel )+39.5 ( floors )+2.62 ( room
YES I would use the model to predict the total value of houses in Boston because the model is
significant.
The total prices calculated according to the model are as shown in the table below;
TOTAL VALUELOT SQFT AGE OF HOUSEGROSS AREAREMODEL LIVING A FLR ROOMS BEDRM FULL BATH HLF BATH KITCHEN FIREPL
535.728 18256 25 3600 1 2588 2 8 5 1 1 1 1
592.42 12000 2 5000 2 3500 3 9 6 1 1 1 1
428.24 8000 62 3500 3 2156 1.5 6 3 1 1 1 1
0.008 0.01 0.03 5.01 0.02 40 2.62 -0.4 8.9 11.46 -4.3 15.7
Table 10
11 | P a g e
chevron_up_icon
1 out of 11
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]