Predicting Selling Prices of Houses - Statistics Study
Verified
Added on  2023/04/23
|17
|4445
|319
AI Summary
This study analyzes the costs of homes in Sydney, Australia, based on factors such as local selling prices, number of bathrooms, area of the site, size of living space, number of garages, rooms, bedrooms, age of the house, and number of fireplaces.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
Table of Contents Introduction......................................................................................................................................3 Methodology....................................................................................................................................3 Data..................................................................................................................................................4 Description of the variables.........................................................................................................6 Data Analysis...................................................................................................................................6 Descriptive Statistics....................................................................................................................6 Measure of association.................................................................................................................8 Regression analysis....................................................................................................................10 Conclusion.....................................................................................................................................13 List of tables Table 1: Dataset...............................................................................................................................4 Table 2: Variable names..................................................................................................................5 Table 3: Description of variables.....................................................................................................6 Table 4: Descriptive statistics..........................................................................................................7 Table 5:Correlations.......................................................................................................................8 Table 6:SUMMARY OUTPUT....................................................................................................10 Table 7:ANOVA...........................................................................................................................11 Table 8: Coefficients table.............................................................................................................11
Introduction Housing is an essential component of human beings. Every human being deserves a place to shelter on (housing). The reason for our task was to try and analyze costs of homes in light of a few factors. Among the factors or rather the parameters that we considered werethe local selling pricescity area, thenumber of bathrooms, the area of the site in thousands of square feet, the size of the living space in thousands of square feet, the number of garages, the number of rooms, the number of bedrooms, the age of the house in years and the number of fire places(Vigenia & Kritikos, 2004).The population of interest is the price of houses in Sydney Australia. A sample population was drawn from the entire population of interest. The main desire of this report is to show signs of improvement comprehension of how different elements of a home impact its moving cost(Boddy & Smith, 2009). This study research is critical to property holders as well as real estate brokers who are engaged with purchasing and moving houses and government authorities who are included with controlling expenses(Kucukmehmetoglu & Geymen, 2008). Methodology Dataforthisstudywasretrievedfromthefollowinglink https://people.sc.fsu.edu/~jburkardt/datasets/regression/x26.txt. The data is a cross sectional data with 28 observations and 10 variables. Both descriptive and inferential statistics were used to analyze the relationship between the three variables. Pearson correlation test and regression analysis model was performed to identify the strength and direction of relationship between the variables. For the regression analysis, we sought to estimate the following regression equation model; B=β0+β1(A1)+β2(A2)+β3(A3)+β4(A4)+β5(A5)+β6(A6)+β7(A7)+β8(A8)+β9(A9)+ε Where the variables are defined as follows;
Variable codeVariable name A1The local selling prices, in hundreds of dollars A2The number of bathrooms A3The area of the site in thousands of square feet A4The size of the living space in thousands of square feet A5The number of garages A6The number of rooms A7The number of bedrooms A8The age in years A9Number of fire places BSelling price Where,β0is the constant coefficient (Intercept coefficient),β1is the coefficient for the first independent variable (A1),β2is the coefficient for the second independent variable (A2),β3is the coefficient for the third independent variable (A3),β4is the coefficient for the forth independent variable (A4),β5is the coefficient for the fifth independent variable (A5),β6is the coefficient for the sixth independent variable (A6),β7is the coefficient for the seventh independent variable (A7),β8is the coefficient for the eight independent variable (A8) andβ9is the coefficient for the ninth independent variable (A9) andεis the error term. Data Asmentionedearlierinthemethodologysection,dataforthisstudywasretrievedfromthefollo winglinkhttps://people.sc.fsu.edu/~jburkardt/datasets/regression/x26.txt. The data is a cross sectional data with 28 observations and 10 variables. Table1: Dataset Inde x A1A2A3A4A5A6A7A8A9B 14.917613.4720.99817442025.9 25.020813.5311.527462029.5 34.542912.2751.17516340027.9
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
44.557314.051.23216354025.9 55.059714.4551.12116342029.9 63.89114.4550.98816356029.9 75.89815.851.2417351130.9 85.603919.521.50106332028.9 916.422.59.83.42210542184.9 1014.462.512.8329514182.9 115.828216.4351.22526332035.9 125.300314.98831.55216330031.5 136.271215.520.97515230031 145.959216.6661.12126332030.9 155.05151.0205246130 165.603919.521.50106332028.9 178.24641.55.151.66428450036.9 186.69691.56.9021.4881.57322141.9 197.78411.57.1021.37616317040.5 209.038417.81.51.57323043.9 215.989415.521.25626340137.5 227.54221.541.6916322037.9 238.79511.59.891.8228450144.5 246.09311.56.72651.65216344037.9 258.36071.59.151.77728448138.9 268.14181.5042733036.9 279.14161.57.32621.8311.58431045.8 28121.551.226330141 Where we have the variables defined as; Table2: Variable names Variable codeVariable name A1The local selling prices, in hundreds of dollars A2The number of bathrooms A3The area of the site in thousands of square feet A4The size of the living space in thousands of square feet A5The number of garages A6The number of rooms A7The number of bedrooms A8The age in years A9Number of fire places BSelling price
Description of the variables The variables used in the study are described in the table below; Table3: Description of variables Variable codeVariable nameVariable typeData type A1The local selling prices, in hundreds of dollars Independent variable Continuous (numerical) A2The number of bathroomsIndependent variable Continuous (numerical) A3The area of the site in thousands of square feet Independent variable Continuous (numerical) A4The size of the living space in thousands of square feet Independent variable Continuous (numerical) A5The number of garagesIndependent variable Continuous (numerical) A6The number of roomsIndependent variable Continuous (numerical) A7The number of bedroomsIndependent variable Continuous (numerical) A8The age in yearsIndependent variable Continuous (numerical) A9Number of fire placesIndependent variable Continuous (numerical) BSelling priceDependent variable Continuous (numerical) Data Analysis Descriptive Statistics We began by looking at the descriptive statistics for the all the nine variables which is given in table 2 below. As can be seen, the averagethe local selling prices, in hundreds of dollarswas found to be 7.22 (SD = 2.96) and with the median local prices being 6.04; the standard deviation shows that the data is not widely distributed out. The maximum and minimum local prices were found to be 16.42 and 3.89 respectively. The skewness value was however 1.76 (a value greater than 1) showing that the data on the local prices is highly skewed (positively skewed).
Table4: Descriptive statistics A1A2A3A4A5A6A7A8A9B Mean7.221.276.461.511.346.683.2936.320.3238.16 Standard Error0.560.080.460.100.120.220.132.610.092.68 Median6.041.006.141.491.256.003.0036.000.0036.40 Mode5.601.004.461.501.006.003.0032.000.0025.90 Standard Deviation 2.960.422.430.550.651.160.7113.820.4814.16 Sample Variance8.740.185.920.300.431.340.51190.890.23200.41 Kurtosis3.193.720.196.18-0.341.341.01-0.09- 1.46 6.87 Skewness1.761.910.642.31-0.641.150.84-0.350.812.57 Range12.531.5010.532.452.005.003.0059.001.0059.00 Minimum3.891.002.280.980.005.002.003.000.0025.90 Maximum16.422.5012.803.422.0010.005.0062.001.0084.90 Sum202.2 1 35.5 0 180.9 0 42.3 3 37.5 0 187.0 0 92.0 0 1017.0 0 9.001068.4 0 Count28282828282828282828 The averagenumber of bathrooms was found to be 1.27 with the median number of bathrooms being 1 and the maximum being 2.5 while the minimum number of bathrooms was 1. The standard deviation for the number of bathrooms shows that the data is not widely spread out (Tze, 2013). The skewness value for the number of bathrooms shows that the data is highly skewed (skewness value greater than 1). The average area of the site in thousands of square feet was 6.46 with the median area being 6.14. The standard deviation of the site area was 2.43 which shows that the data is not widely spread out from the mean(Wang & S.-M, 2006). The skewness value was 0.64 (a value greater than 0.5) which shows that the data is slightly positively skewed. The average size of the living space in thousands of square feet was 1.51 with the median area being 1.49. The standard deviation of the size of the living space in thousands of square feet was 0.55 which shows that the data is not widely spread out from the mean. The skewness value was 2.31 (a value greater than 1) which shows that the data is highly positively skewed.
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
The average number of garages was 1.34 with the median area being 1.25. The standard deviation of the number of garages was 0.65 which shows that the data is not widely spread out from the mean. The skewness value was -0.64 (a value greater than -0.5) which shows that the data is slightly negatively skewed. Measure of association We performed a Pearson correlation test to investigate the relationship between all the nine variables. According to Schouhamer and Weber (2010), Pearson correlation coefficient is between -1 and +1 where the more positive value indicates a stronger positive relationship while the more negative value shows a negative relationship. Results are given in table 5 below. Table5:Correlations A1A2A3A4A5A6A7A8A9B A1 Pearson Correlation1.881**.628**.840**.514**.751**.653**-.343.492**.923** Sig. (2-tailed).000.000.000.005.000.000.074.008.000 N28282828282828282828 A2 Pearson Correlation.881**1.583**.894**.400*.757**.726**-.201.481**.925** Sig. (2-tailed).000.001.000.035.000.000.305.010.000 N28282828282828282828 A3 Pearson Correlation.628**.583**1.681**.176.565**.459*-.383*.376*.667** Sig. (2-tailed).000.001.000.372.002.014.044.048.000 N28282828282828282828 A4 Pearson Correlation.840**.894**.681**1.364.841**.791**-.177.372.922** Sig. (2-tailed).000.000.000.057.000.000.366.051.000 N28282828282828282828 A5 Pearson Correlation.514**.400*.176.3641.566**.540**-.058.292.462* Sig. (2-tailed).005.035.372.057.002.003.771.132.013 N28282828282828282828 A6 Pearson Correlation.751**.757**.565**.841**.566**1.924**.011.397*.777** Sig. (2-tailed).000.000.002.000.002.000.954.037.000 N28282828282828282828 A7 Pearson Correlation.653**.726**.459*.791**.540**.924**1.107.265.701** Sig. (2-tailed).000.000.014.000.003.000.588.172.000 N28282828282828282828
A8 Pearson Correlation-.343-.201-.383*-.177-.058.011.1071.091-.299 Sig. (2-tailed).074.305.044.366.771.954.588.646.122 N28282828282828282828 A9 Pearson Correlation.492**.481**.376*.372.292.397*.265.0911.490** Sig. (2-tailed).008.010.048.051.132.037.172.646.008 N28282828282828282828 BPearson Correlation.923**.925**.667**.922**.462*.777**.701**-.299.490**1 Sig. (2-tailed).000.000.000.000.013.000.000.122.008 N28282828282828282828 **. Correlation is significant at the 0.01 level (2-tailed). *. Correlation is significant at the 0.05 level (2-tailed). The above results shows that there is a significant relationship between the expected house prices and eight of the nine independent variables. The only variable that did not have any significant relationship with the dependent variable (House prices) is the age of the house in years.The numberof bathrooms had the highest relationship with the dependent variable (selling price). The correlation coefficient was found to be 0.925 with the p-value of 0.000. There was also a very strong relationship between the local selling prices, in hundreds of dollars (A1) and the selling price (B) where the correlation coefficient was found to be 0.923 (p = 0.000). The correlation coefficient between the area of the site in thousands of square feet (A3) and the selling price (B) was 0.922 (p = 0.000). Scatter plot The scatter plot below is that of the selling price versus the local selling prices(Allen, et al., 2008). As can be seen, the scatter plot further confirms that there is a strong positive relationship between local selling price(A1) and the selling prices of the houses (B).
Regression analysis In this section, we present the results of the regression analysis. Table 6 below gives the summary output. Table6:SUMMARY OUTPUT Regression Statistics Multiple R0.97167 R Square0.944142 Adjusted R Square0.916213 Standard Error4.097823 Observations28 From the summary table provided above, we see that the value of R-Squared (R2) is 0.9441; this implies that 94.41% of the variation in the dependent variable (Selling price) is explained by the nine independent variables given in the model. The above results shows that huge proportion of the variation in the dependent variable (Selling price) is explained by the factors within the model and only a small proportion of the variation in the dependent variable (Selling prices) is explained by factors outside the model.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
The table below gives the ANOVA results, clearly it can be observed that the p-value for the F- statistics is 0.000 (a value less than 5% level of significance), we therefore reject the null hypothesis and conclude that the overall model is significant at 5% level of significance. That is, the effect of the nine independent variables on the dependent variable (Selling prices) is significantly and statistically different from zero. Table7:ANOVA dfSSMSF Significanc e F Regressio n95108.93567.658933.805011.79E-09 Residual18302.258816.79215 Total275411.189 The last table below gives the regression coefficients. The table gives the significance of the individual nine variables in the model, we can observe that all the nine independent variables were fund to be significant in the model (p < 0.05). Table8: Coefficients table Coefficient sStandard Errort StatP-value Lower 95%Upper 95% Intercep t5.7900487.0886910.8168010.000-9.1027420.68283 A11.1982980.7445571.6094110.000-0.365962.762555 A28.4063585.5707021.509030.000-3.2972520.10997 A30.0611680.522780.1170050.000-1.037151.159489 A412.681134.5795962.7690510.0003.0597622.30251 A51.749831.729281.0118830.002-1.883255.382912 A6-0.548352.355476-0.23280.012-5.497034.400316 A7-1.021873.523712-0.290.023-8.424916.381177 A8-0.070210.082827-0.847680.000-0.244220.103802 A92.2291862.3745470.9387830.000-2.759557.217924
The coefficient of thelocal selling prices, in hundreds of dollars (A1)is 1.1983; this means that a unit increase in thelocal selling priceswould result to an increase in the selling price by 1.1983. Similarly, a unit decrease in thelocal selling priceswould result to a decrease in the selling price by 1.1983(Yang , 2009). The coefficient of thenumber of bathrooms (A2)is 8.4064; this means that a unit increase in the number of bathroomswould result to an increase in the selling price by 8.4064. Similarly, a unit decrease in thenumber of bathroomswould result to a decrease in the selling price by 8.4064. The coefficient of thearea of the site in thousands of square feet (A3)is0.0612; this means that a unit increase in thearea of the site in thousands of square feetwould result to an increase in the selling price by 0.0612. Similarly, a unit decrease in thearea of the site in thousands of square feetwould result to a decrease in the selling price by 0.0612(Tofallis, 2009). The coefficient of thesize of the living space in thousands of square feet (A4)is12.6811; this means that a unit increase in thesize of the living space in thousands of square feetwould result to an increase in the selling price by 12.6811. Similarly, a unit decrease in thesize of the living space in thousands of square feetwould result to a decrease in the selling price by 12.6811. The coefficient of thenumber of garages (A5)is1.7498; this means that a unit increase in the number of garageswould result to an increase in the selling price by 1.7498. Similarly, a unit decrease in thenumber of garageswould result to an increase in the selling price by 1.7498. The coefficient of thenumber of rooms (A6)is-0.5484; this means that a unit increase in the number of roomswould result to a decrease in the selling price by 0.5484. Similarly, a unit decrease in thenumber of roomswould result to an increase in the selling price by 0.5484.
The coefficient of thenumber of bedrooms (A7)is-1.0219; this means that a unit increase in the number of bedroomswould result to a decrease in the selling price by 1.0219. Similarly, a unit decrease in thenumber of bedroomswould result to an increase in the selling price by 1.0219. The coefficient of theage of the house in years (A8)is-0.07021; this means that a unit increase in theage of the house in yearswould result to a decrease in the selling price by 0.0702. Similarly, a unit decrease in theage of the house in yearswould result to an increase in the selling price by 0.0702. The coefficient of thesize of the number of fire places (A9)is2.2292; this means that a unit increase in thesize of the number of fire placeswould result to an increase in the selling price by 2.2292. Similarly, a unit decrease in thenumber of fire placeswould result to a decrease in the selling price by2.2292. Conclusion This study sought to investigate factors that affect the selling prices of house. Results showed all the factors tested in the model had significant influence in the selling prices of houses. Factors such as local selling price,number of bathrooms, area of the site in thousands of square feet, size of the living space in thousands of square feet, number of garages and number of fire places had positive relationship with the selling price. However, factors such as age of the house in years, number of rooms and number of bedrooms had negative relationship with the selling price. References Allen, M. J., Madura, S. & Wiant, K., 2008. Commercial bank exposure and sensitivity to the real estate market.Journal of Real Estate Research,10(2), pp. 129-40.
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Boddy, R. & Smith, G., 2009. Statistical methods in practice: for scientists and technologists. Journal of Statistics,5(1), p. 95–96. Kucukmehmetoglu, M. & Geymen, A., 2008. Measuring the spatial impacts of urbanization on the surface water resource basins in Istanbul via remote sensing. Environmental Monitoring and Assessment. 142(1-3), pp. 153-69. Schouhamer , K. I. & Weber , J., 2010. Minimum Pearson Distance Detection for Multilevel Channels With Gain and/or Offset Mismatch.IEEE Transactions on Information Theory,60(10), p. 5966–5974. Tofallis, C., 2009. Least Squares Percentage Regression.Journal of Modern Applied Statistical Methods,7(5), p. 526–534. Tze, S. O., 2013. Factors Affecting the Price of Real Estate Properties in Malaysia.Journal of Emerging Issues in Economics, Finance and Banking,1(5), pp. 34-45. Vigenia, D. & Kritikos, A. S., 2004. The individual micro-lending contract: is it a better design than joint liability? Evidence from Georgia, in: Economic Systems.Journal of finance,pp. 155- 176. Wang, D. & S.-M, L., 2006. Socio-economic Differentials and Stated Housing Preferences in Guangzhou.Habitat International,30(2), pp. 305-326. Yang , J. L., 2009. Human age estimation by metric learning for regression problems. International Conference on Computer Analysis of Images and Patterns,p. 74–82.
Appendix Inde x A1A2A3A4A5A6A7A8A9B 14.917 6 13.4720.99817442025.9 25.02013.5311.527462029.5