Visualization and Statistical Analysis of Real World Dataset

Verified

Added on  2022/12/22

|23
|2931
|1
AI Summary
This report provides a detailed analysis of a real world dataset on housing market in the UK. It includes visualization methods, descriptive statistics, correlation analysis, and more. The dataset includes information on house prices, types of houses, number of bedrooms and bathrooms, and distance from railway station.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
Visualization and statistical analysis of a real world dataset

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
EXECUTIVE SUMMARY
In accordance of project report this can be summarized that data set has been taken of 80
different houses including various kinds of characteristics like price, type of house and many
more. The objective of project report is to do detailed analysis on housing market in post code
B17 United Kingdom. This analysis has been done by a research company which acts as an
estate agency. The report also abstracts about various kinds of visualization models, descriptive
statistical analysis etc.
Document Page
Contents
EXECUTIVE SUMMARY.........................................................................................................................2
MAIN BODY..............................................................................................................................................4
REFERENCES..........................................................................................................................................20
Document Page
MAIN BODY
1. A description of the problem, the source of your data.
Description of problem- The problem in such data set is related to finding different aspect
of houses which need to consider like number of bedrooms, bathrooms and many more.
In addition to this, it was quite difficult to assess a reliable source through which data can
be gathered in an effective manner.
Source of data- The data has been taken from an appropriate website which is
rightmove.com. This site is known to gather data related to real state including various
kinds of aspects.
Sampling method: Random sampling is a method where the likelihood of being selected
is proportional to each sample. A randomly selected sample is intended to reflect the
entire population unequivocally (Chambers, 2018). If the survey does not constitute the
populace for any purposes, the difference is known as random errors. Random sample
gathering information about a population is one of the easiest types. Each representative
of the subset has an equal chance of being chosen as part of the testing phase under
random selection.
2. Produce at least three visualization methods.
The graphical interface of facts and statistics is the data visualization. Data analysis
applications provide better terms to see and interpret trends, outlines and trends of data
by using graphic elements such as tables, diagrams and charts (Schabenberger and
Gotway, 2017).
Scatter chart- A scatter plot is a plot form or statistical diagram, which displays values for
two variables usually for a data set by using linear combinations. A more vector can be
shown if the points are coded.

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
0 1 2 3 4 5 6 7 8 9 10
0
2
4
6
8
10
12
Scatter chart
Series2 Bedroom
Washroom Distance from railway station
Line chart- A diagram is a type of diagram used to denote time-changing details. We
draw line diagrams using multiple point lines connected. We call it a map of the rows, too
(Miles, Huberman and Saldaña, 2018). The line diagram consists of 2 axes, the axis "x"
and the axis "y." The x-axis is defined as the lateral axis.
End
terrace Semi
detached Detached Semi
detached Semi
detached Semi
detached End
terrace Semi
detached Terraced
0
2
4
6
8
10
12
Line chart
Series1 Bedroom
Washroom Distance from railway station
Pie chart- A pie diagram is a circle graphic that is separated into divides to show a
quantity. The phase margin of each piece of a pie map is equal to the amount of the pie.
Document Page
Pie chart
End terrace Semi detached Detached Semi detached Semi detached
Semi detached End terrace Semi detached Terraced
3. A clear summary and table of the descriptive statistics and the information which can be
obtained from these statistics.
Descriptive statistics-
Statistics
Bedroom
N Valid 80
Missing 0
Mean 4.08
Median 4.00
Mode 4
Std. Deviation 1.053
Bedroom
Frequency Percent Valid
Percent
Cumulative
Percent
Valid 3 23 28.7 28.7 28.7
Document Page
4 36 45.0 45.0 73.8
5 17 21.3 21.3 95.0
6 3 3.8 3.8 98.8
10 1 1.3 1.3 100.0
Total 80 100.0 100.0
Interpretation: In accordance of above done analysis this can be stated that among 80 houses,
number of average bedroom is 4. While there is less number of houses in which there are 10
rooms. In addition to this, mean value is 4.08 and standard deviation is of 1.05. It shows that data
is not equally distributed in line with mean value.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Statistics
Washroom
N Valid 80
Missing 0
Mean 2.00
Median 2.00
Mode 2
Std. Deviation .871
Washroom
Frequency Percent Valid
Percent
Cumulative
Percent
Valid
1 25 31.3 31.3 31.3
2 34 42.5 42.5 73.8
3 18 22.5 22.5 96.3
4 2 2.5 2.5 98.8
5 1 1.3 1.3 100.0
Total 80 100.0 100.0
Document Page
Interpretation: In accordance of above done analysis this can be stated that among 80 houses,
number of average bathroom is 2. While there is less number of houses in which there are 5
bathrooms. In addition to this, mean value is 2 and standard deviation is of 0.871. It shows that
data is not equally distributed in line with mean value.
Statistics
Price
N Valid 80
Missing 0
Mean 573174.88
Document Page
Median 575000.00
Mode 620000
Std. Deviation 132403.546
Interpretation-In accordance of above done statistical analysis this can be stated that majority of
houses are under 700000 pounds. While there is only one room whose price is of 1100000
pounds. As well as mean value and standard deviation value is far away from each other which
show that there is not equal distribution of data set.

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Statistics
Distance from railway
station
N Valid 80
Missing 0
Mean 2.001250
Median 2.000000
Mode 1.9000
Std. Deviation .3458593
Distance from railway station
Frequency Percent Valid
Percent
Cumulative
Percent
Valid 1.1000 2 2.5 2.5 2.5
1.2000 1 1.3 1.3 3.8
1.4000 1 1.3 1.3 5.0
1.5000 2 2.5 2.5 7.5
1.6000 2 2.5 2.5 10.0
1.7000 9 11.3 11.3 21.3
1.8000 7 8.8 8.8 30.0
1.9000 15 18.8 18.8 48.8
2.0000 7 8.8 8.8 57.5
2.1000 8 10.0 10.0 67.5
2.2000 9 11.3 11.3 78.8
2.3000 4 5.0 5.0 83.8
2.4000 5 6.3 6.3 90.0
2.5000 4 5.0 5.0 95.0
Document Page
2.6000 2 2.5 2.5 97.5
2.8000 1 1.3 1.3 98.8
2.9000 1 1.3 1.3 100.0
Total 80 100.0 100.0
In accordance of above done statistical analysis this can be stated that majority of houses are at
the distance of 1.9 miles away from railway station. While shortest distance is 1.10 of some
house and longest distance is 2.90 miles. As well as mean value and standard deviation value is
far away from each other which show that there is not equal distribution of data set.
Document Page
4. assuming that the housing prices are normally distributed present your manager with the 95%
or 99% confidence interval.
Hypothesis:
H0: Types of house depends on price of houses.
H1: Types of house does not depend on price of houses.
Between-Subjects Factors
N
Type of
house
Detached 5
End terrace 28
Semi detached 34
Terraced 13
Tests of Between-Subjects Effects
Dependent Variable: Price
Source Type III Sum
of Squares
df Mean Square F Sig.
Corrected
Model
47002759455
5.343a 3 15667586485
1.781 13.015 .000
Intercept 18973846877
008.734 1 18973846877
008.734 1576.146 .000
Type of house 47002759455
5.344 3 15667586485
1.781 13.015 .000
Error 91489761899
3.406 76 12038126565
.703
Total 27667280200
050.000 80

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Corrected Total 13849252135
48.749 79
a. R Squared = .339 (Adjusted R Squared = .313)
Interpretation: In accordance of above done analysis this can be stated that value of significance
difference or p value is of 0.000 which lower than 0.05. This shows that there is significant
relation between price of house and type of house. As well as price of house depends on type of
house. Hence this can be stated that price of house depends on type of house
5. A summary on whether the average price of the different type of houses in your data sample is
in line with the average price in the UK
Type of house
Average
price
Price as
per
research
Detached £329,600 £560509
Semi-detached £203,943 £489939
Terrace £202,972 £536339
Flat £250,101 £542811
Document Page
Detached
Semi-detached
Terrace
Flat
0
100,000
200,000
300,000
400,000
500,000
600,000
Average price
Price as per research
Interpretation: In accordance of above prepared graph this can be stated that average price of
houses in UK is very low as compared to price set in research. The price is almost double of
average price in United Kingdom.
6. Carry out correlation analysis (i.e. correlation matrix) between price (dependent variable) and
all the other house characteristics.
Correlation analysis- In statistics, a correlation or dependency, between two or more variables or
multivariate regression information is any empirical connection, whether significant or not
(Lista, 2017). In the broadest context, every statistical relationship is a correlation, but it
generally applies to how often a couple of variables are linearly connected.
Hypothesis 1:
H0: There is significant relation between price of house and number of bedrooms.
H1: There is no significant relation between price of house and number of bedrooms.
Correlation between price of houses and number of bedrooms:
Correlations
Document Page
Price Bedroom
Price
Pearson
Correlation 1 .695**
Sig. (2-tailed) .000
N 80 80
Bedroom
Pearson
Correlation .695** 1
Sig. (2-tailed) .000
N 80 80
**. Correlation is significant at the 0.01 level (2-
tailed).
Interpretation: In accordance of above done correlation analysis this can be stated that there is
significant relation between price of houses and number of bedrooms. It has been justified in
accordance of computed value of correlation which 0.695 (0.3> positive relation).
Hypothesis 2:
H0: There is significant relation between price of house and number of bathrooms.
H1: There is no significant relation between price of house and number of bathrooms.
Correlations
Price Washroom
Price
Pearson
Correlation 1 .554**
Sig. (2-tailed) .000
N 80 80
Washroom Pearson
Correlation
.554** 1

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Sig. (2-tailed) .000
N 80 80
**. Correlation is significant at the 0.01 level (2-
tailed).
Interpretation: In accordance of above done correlation analysis this can be stated that there is
significant relation between price of houses and number of bathrooms. It has been justified in
accordance of computed value of correlation which 0.554 (0.3> positive relation).
Hypothesis 3:
H0: There is significant relation between price of house and distance from railway station.
H1: There is no significant relation between price of house and distance from railway station.
Correlations
Price Distance
from railway
station
Price
Pearson
Correlation 1 .050
Sig. (2-tailed) .662
N 80 80
Distance from railway
station
Pearson
Correlation .050 1
Sig. (2-tailed) .662
N 80 80
Document Page
Interpretation: as per above table, this can have stated that their positive relation between price of
houses and their distance from railway station. It is so because of computed value of Pearson
correlation which is more than 0.3.
7. Carry out regression analysis.
Regression analysis- Regression analytical methods used to estimate the interactions between
predictor variables and one or more different variables are a series of research models. The
strength of association between factors can be measured and the potential relationship between
them would be formed.
Hypothesis:
H0: There is significant relation between price and house characteristics.
H1: There is no significant relation between price and house characteristics.
Variables Entered/Removeda
Model Variables
Entered
Variables
Removed
Method
1
Distance
from railway
station,
Bedroom,
Washroomb
. Enter
a. Dependent Variable: Price
b. All requested variables entered.
Model Summary
Model R R Square Adjusted R
Square
Std. Error of
the Estimate
1 .701a .492 .471 96259.056
Document Page
a. Predictors: (Constant), Distance from railway station,
Bedroom, Washroom
ANOVAa
Model Sum of
Squares
df Mean Square F Sig.
1
Regression 68072396963
3.719 3 22690798987
7.906 24.489 .000b
Residual 70420124391
5.031 76 9265805840.
987
Total 13849252135
48.750 79
a. Dependent Variable: Price
b. Predictors: (Constant), Distance from railway station, Bedroom, Washroom
Coefficientsa
Model Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
B Std. Error Beta
1
(Constant) 220405.329 76261.024 2.890 .005
Bedroom 76015.761 14514.833 .604 5.237 .000
Washroom 19513.762 17675.966 .128 1.104 .273
Distance from railway
station 1987.656 31678.726 .005 .063 .950
a. Dependent Variable: Price

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Interpretation: In accordance of above done regression analysis this can be stated that value of R
is 0.701 and R square is of 0.492. In addition to this, significance difference is of 0.00 which
lower than 0.05. It indicates that various kinds of independent variables like number of
bedrooms, bathrooms have a significant on price of houses.
Model selection: Regression analysis is used for assessing links between two or more variables
as a mathematical tool (Heeringa, West and Berglund, 2017). Regression analysis enables an
organization with the aid of corporate computational tools to consider what the data points
represent and use them correctly to make informed decisions. In the aspect of above part, this
technique seems suitable and due to which such approach has been used.
8. Carry out the residual analysis.
Residual analysis- In the validation of the linear regression, regularly test plays a major role. The
formula would be true if the Standard Error in the Regression Model meets the four previously
stated premises (Agresti, 2018). Considering that these hypotheses also underlie the statistical
significant research, the results of these sense experiments are questioned if the μ hypotheses are
not fulfilled.
Case Processing Summary
Bedroom Cases
Valid Missing Total
N Percent N Percent N Percent
Price
3 23 100.0% 0 0.0% 23 100.0%
4 36 100.0% 0 0.0% 36 100.0%
5 17 100.0% 0 0.0% 17 100.0%
6 3 100.0% 0 0.0% 3 100.0%
10 1 100.0% 0 0.0% 1 100.0%
Tests of Normalityb
Document Page
Bedroom Kolmogorov-Smirnova Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
Price
3 .164 23 .109 .938 23 .162
4 .123 36 .185 .915 36 .009
5 .297 17 .000 .593 17 .000
6 .385 3 . .750 3 .000
a. Lilliefors Significance Correction
b. Price is constant when Bedroom = 10. It has been omitted.
9. Write the derived statistical model and give example of its usage.
The method of predictive analysis is used to model a dataset. A statistical model is a numerical
(or mathematical model) description of the observable reality.
In order to better explain and analyze the statistics, data analysts are able to apply different
mathematical models to the data that they are examining (Górecki, Waszak and Wołyński, 2018).
This approach encourages them to define interactions between variables, to forecast potential
sets of data and to visualize data so that non-analysts and clients can use them and manipulate
them instead of scrutinizing raw data.
10. Methods which are taken.
In the aspect of above part, a range of methods are used as per requirement in different kinds of
tasks. One of the methods which is used is comparative analysis that has been applied to assess
prices of United Kingdom and research price. In addition to this, some other methods are also
used like regression analysis, residual analysis etc.
Such methods are used only because due to their efficiency and effectiveness of each method. By
help of each method it becomes easier for users to know about nature of data.
Document Page

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
REFERENCES
Chambers, J.M., 2018. Graphical methods for data analysis. CRC Press.
Schabenberger, O. and Gotway, C.A., 2017. Statistical methods for spatial data analysis. CRC
press.
Miles, M.B., Huberman, A.M. and Saldaña, J., 2018. Qualitative data analysis: A methods
sourcebook. Sage publications.
Lista, L., 2017. Statistical methods for data analysis in particle physics (Vol. 941). Springer.
Heeringa, S.G., West, B.T. and Berglund, P.A., 2017. Applied survey data analysis. CRC press.
Agresti, A., 2018. An introduction to categorical data analysis. John Wiley & Sons.
Górecki, T., Krzyśko, M., Waszak, Ł. and Wołyński, W., 2018. Selected statistical methods of
data analysis for multivariate functional data. Statistical Papers, 59(1), pp.153-182.
Online
About data of different house, 2019 [Online] available through:<
https://www.rightmove.co.uk/property-for-sale/find.html?locationIdentifier=OUTCODE
%5E52&index=24&propertyTypes=&includeSSTC=false&mustHave=&dontShow=&furnishTy
pes=&keywords=>
1 out of 23
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]