BUS708: Statistics and Data Analysis Report: Airbnb Housing Prices

Verified

Added on  2022/09/10

|15
|2578
|20
Report
AI Summary
This report analyzes the factors influencing Airbnb housing prices using inferential statistics. The study utilizes two datasets: one secondary dataset from Airbnb and a primary dataset collected through surveys. The report investigates the plausibility of a specific proportion for private rooms, analyzes the average price of private rooms after outlier removal, examines differences in room availability across different room types, explores the relationship between accommodation prices and property longitude, and assesses the relationship between gender and room type. The analysis includes numerical summaries, graphical displays, confidence intervals, hypothesis tests, and regression analysis. The key findings indicate that a 40% proportion for private rooms is not plausible, the average price of a private room is greater than $70, there is a significant difference in availability between room types, the longitude of a property can predict the price, and gender and room type are independent. The report concludes with potential areas for further research, such as exploring future housing price growth and developing predictive models.
Document Page
Running head: BASIC BUSINESS STATISTICS 1
Basic Business Statistics
Student Name
Professor’s Name
University Name
Date
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
BASIC BUSINESS STATISTICS 2
Basic Business Statistics
Section 1
Introduction
The housing industry in Austria is facing an affordability crisis and as a result, the
government and other stakeholders are investing their time and resources to develop the
appropriate housing model that could possibly counter the crisis (McLaren, Yeo & Michael).
Empirical models of the Australian housing market have been developed to quantify the
relationship between construction, prices, rents and vacancies and how these factors can be put
into account in establishment of affordable housing (Saunders & Tulip, 2019). The objective of
the paper is to apply inferential statistics on both primary and secondary data to determine the
very basic factors that influence the housing price in Airbnb and to examine how statistically
significant the factors can be relied upon in examination of housing prices.
Description about Dataset
The datasets used are labelled dataset 1 and dataset 2 and their specific features are
discussed below.
Dataset 1
Dataset 1 is collected from an already existing website hence it is a secondary data. It
has got twenty-two variables of categorical and numeric type. Categorical variables are; listing
URL, name, city, property type and room type. On the other hand, numeric variables are; id,
latitude, longitude, accommodates, bathrooms, bedrooms, beds, price, guests included, extra
people, availability_30, availability_60, availability_90, availability_365, number of reviews and
review score rating. The dataset has got different cases highlighted as the distinct properties in
Airbnb from which the data is collected (Shao, 2010). The data consists of 10000 samples.
Document Page
BASIC BUSINESS STATISTICS 3
Dataset 2
Dataset 2 is collected from one-on-one survey of thirty random international students in
my class hence it is primary data (Levie, 2012). It has got three variables. The variable random
code is discrete numeric, indicating the random id allocated to the student by the time of data
collection. The variable gender is categorical indicating the sex of the student while the variable
room type is categorical indicating the type of room occupied by the international student. The
cases in the dataset are the different students surveyed. The sample size chosen for data
collection is thirty. The limitation of the data is that it is collected from a sample of random
students only in my class and hence it may not be representative of the whole population of
international students (Linoff, 2012).
Literature Review
The housing industry in Austria is facing an affordability crisis thereby pressing policy
challenge for the government. Analysis from historical data indicates that by the year 2016, the
deficit of the gross social housing that was being provided had hit an optimum value at 140,000
houses and the country’s public housing system had also become highly rundown (Pawson,
Milligan, & Martin, 2018). Moreover, In Australia’s capital cities, the housing market indicated
that the prices of the houses that could be affordable to low and medium range earners had
become scarce.
To address affordable housing scarcity issue, the government, policy makers and other
stakeholders are investing their time and resources to develop the appropriate housing model that
could possibly counter the crisis (McLaren, Yeo & Michael). Specifically, huge emphasis has
been placed in the creation of private owned affordable housing industry, developed through
private financing and supported the government through public private partnership.
Document Page
BASIC BUSINESS STATISTICS 4
Empirical models of the Australian housing market have been developed to quantify the
relationship between construction, prices, rents and vacancies and how these factors can be put
into account in establishment of affordable housing (Saunders & Tulip, 2019).
Section 2: Is 40% a Plausible Value for the Proportion of Private Room in Airbnb Room
Type?
The proportion for the distinct room types that are available in Sydney Airbnb for rent are
highlighted in the following summary statistic table.
The visualization of proportions can be done through the aid of a pie chart (Lock, 2013). The pie
chart is shown below.
From the pie chart above, the proportion of room type “entire home/apt” in Airbnb is 62%,
private room is 35%, hotel room is 1% and shared room is 2% of the total.
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
BASIC BUSINESS STATISTICS 5
Confidence Interval
The sample size is 10000 while the actual proportion of the private rooms is 35%. The
assumption for the hypothesis test is that:
np 10n(1 p) 10
Hence,
10000 ( 0.35 ) =3500 1010000 ( 10.35 ) =6500 10
The assumption for the confidence interval is met. The critical value of the test statistic is
given by:
Zc=Z1
2
Zc=Z1 0.05
2
=1.96
The confidence interval is given by:
CI =P± Zc p ( 1 p )
n
CI =0.35 ±1.96 0.35(10.35)
10000
CI =0.35 ± 0.0093
CI =(0.34,0.36)
The lower limit of the confidence interval is 0.34 while the upper limit is 0.36. The
confidence interval of proportion can be written as:
CI =34 % P 36 %
Since 40% is outside the bound of the confidence interval, we can conclude that 40% is not a
plausible value for the proportion of private room in Airbnb room type.
Section 3: After an Iteration of Outlier Removal, is the Price of Private Room more than
$70?
Document Page
BASIC BUSINESS STATISTICS 6
To remove the first outliers, the lower and upper bounds of outliers are determined using
the formulae below.
Lower bound= ( 1.5IQR ) the1 st quartile
Upper bound =3 rd quartile+(1.5IQR)
Any values that fall below the lower bound or above the upper bound are considered as outliers.
After removal of the first outliers, the distribution of the private rooms is described by the
following summary statistics.
Price Summary
count 3509
Q1 51
Q3 90
IQR 39
Lower -7.5
Upper 148.5
New Count 3228
New Mean 69.95
New SD 24.59
New Median 65
From the summary statistics, the average price of a private room is $69.95. The standard
deviation for the distribution of the price is $24.595 and the median is 65.00. The boxplot below
is a graphical display of the prices of the private rooms after the removal of first outliers.
Document Page
BASIC BUSINESS STATISTICS 7
The remaining outliers are shown by the stars in the right-hand side. From the box plot
more observations are concentrated on the low end of the scale hence the distribution of the price
for private rooms is skewed to the right. Also, the mean is greater than the median from the
summary statistics affirming the skewness to the right.
Hypothesis Test
The hypothesis under investigation can stated as:
null hypothesis Ho : μ 70
Alternative hypothesis Hi :μ <70
The standard deviation for the population is unknown and the test to be performed is left tailed t-
test for a single mean. The alpha value of the level of significance is 0.05 and the critical value of
t is -1.645. The rejection region is:
R=t :t1.645
The test statistic is given by:
t= Xμo
s
n
=69.9570
24.595
3228
=0.116
The p-value obtained is 0.454. Since p=0.454 0.05 we fail to reject the null hypothesis. We
conclude that the price of private room is greater than or equal to $70.
Section 4: Is there a difference in availability in the next 365 days between different room
type?
The numerical summary for the distribution of availability for 365 days in the future for
each room type is shown below.
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
BASIC BUSINESS STATISTICS 8
The boxplot below is a graphical display for the distribution of the availability for 365.
There are no outliers in the distribution.
From the box plot, the right tail for each distribution of availability_365 for each room type is
longer than the left tail indicating that all the distributions for the different room types are
positively skewed. However, the distribution for availability_365 for private rooms is more
positively skewed, followed by entire home/apt and shared room respectively. The distribution of
the private rooms is the least positively skewed. The skewness can also be determined by
observing the mean and the median of the distribution of each room type from the numerical.
The means are greater than the media indicating positive skewness (Rumsey, 2015).
Hypothesis Test
Document Page
BASIC BUSINESS STATISTICS 9
To test whether there is a difference in availability on the next 365days between different room
types, we use single factor ANOVA to test the difference in the availability in the next 365days.
The figure below is a representation of the test.
Anova: Single Factor
SUMMARY
Groups Count Sum Average Variance
Entire home/apt 6205 584444 94.1892 14287.55
Private room 3509 319408 91.02536 16260.88
Hotel room 102 20080 196.8627 12448.97
Shared room 184 24171 131.3641 20594.57
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 1360820 3 453606.6 30.08609 2.35E-19 2.605797
Within Groups 1.51E+08 9996 15076.95
Total 1.52E+08 9999
From the ANOVA table above, we see that p value (2.35E-19) is much less than the alpha value
or the level of significance (0.05), we can conclude that there is a significant different in the
availability in the next 365days.
Section 5: Can we Predict the Price of the Accommodation using the Longitude of the
Property
The scatter plot is used to show the relationship between longitude and price. The vertical
axis is the dependent variable price while the horizontal axis is the independent variable
longitude (Bruce, 2015).
Document Page
BASIC BUSINESS STATISTICS 10
The numeric summaries derived from the scatter plot above are illustrated in the table below.
The regression model below can be developed to predict the price of the Airbnb accommodation
using the longitude of the property.
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
BASIC BUSINESS STATISTICS 11
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.136276799
R Square 0.018571366
Adjusted R Square 0.018473204
Standard Error 391.7627377
Observations 10000
ANOVA
df SS MS F Significance F
Regression 1 29036516 29036516 189.19 1.17E-42
Residual 9998 1.53E+09 153478
Total 9999 1.56E+09
CoefficientsStandard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept -94037.63512 6852.117 -13.7239 1.77E-42 -107469 -80606.1 -107469 -80606.1
longitude 623.3346773 45.31815 13.75464 1.17E-42 534.502 712.1674 534.502 712.1674
The correlation coefficient is 0.136 and it indicates that there is a weak positive linear
relationship between price and longitude. Only 13.6% of the relationship between the said
variables can be explained (Lock, 2013). The coefficient of determination is 0.0185 and it
indicates that the variability between the variables that can be explained by the regression model
is 1.85%. The regression equation developed for the relationship between the variables is:
Y =623.335 x94037.637
Where Y represents price and x represents longitude. The slope (623.335) is positive and
indicated that a single change in longitude would impact the price positive by a value of 623.335
(Croucher, 2016). On the other hand, the intercept is -94037.637 and it shows the price given that
the longitude is zero (Holmes, Illowsky & Dean, 2019). Since p-value of the independent
variable is less than the significance level, we can say that the variable (longitude) is statistically
significant. Moreover, the value of F-significance (1.17E-42) is less than the significance level
indicating that the model is overly statistically significant and thus can effectively be used in
prediction.
Document Page
BASIC BUSINESS STATISTICS 12
Section 6: Relationship between gender and room type accommodation
The numerical summary is shown in the table below.
The stacked bar chart below highlights the graphical display for the relationship between
gender and accommodation type for international student (Newbold, Carlson & Thorne, 2013).
From the numeric summary and the stacked bar chart, more international students of the female
gender live in a shared room, followed by entire home/apt, and private room respectively. Least
of them leave in hotel rooms. On the other hand, more international students of the male gender
live in private rooms, followed by shared room and the least is hotel room and entire home/apt
respectively.
Hypothesis Test
chevron_up_icon
1 out of 15
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]