Dublin Business School: B9DA103 Data Mining - Airbnb Case Study
VerifiedAdded on 2022/08/17
|10
|1521
|15
Project
AI Summary
This project analyzes Airbnb data from New York City using data mining techniques. The analysis begins with an overview of Airbnb's business model and the dataset features, which include host information, location details, and listing characteristics. The project explores the data using RapidMiner, generating donut charts, pyramid plots, and horizontal bar charts to compare host listings, minimum nights, average prices, and room availability across different neighborhoods. A decision tree algorithm is then employed to predict rental prices based on factors like room type, availability, minimum nights, reviews, and host listings. The results highlight key factors influencing rental values and provide recommendations for optimizing listings and revenue generation, particularly emphasizing investment in private and entire home/apt properties. The project concludes with a discussion of the practical value of data-driven insights for hosts and the overall implications for Airbnb's business strategies. The report includes a bibliography of relevant sources.

Running head: B9DA103 DATA MINING: AIRBNB CASE STUDY
B9DA103 Data Mining: AirBnb Case Study
Name of the Student
Name of the University
Authors note
We have no known conflict of interest to disclose
The following project is completed depending upon the analysis of the data which is available
at https://www.kaggle.com/dgomonov/new-york-city-airbnb-open-data using Rapidminer.
B9DA103 Data Mining: AirBnb Case Study
Name of the Student
Name of the University
Authors note
We have no known conflict of interest to disclose
The following project is completed depending upon the analysis of the data which is available
at https://www.kaggle.com/dgomonov/new-york-city-airbnb-open-data using Rapidminer.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

1B9DA103 DATA MINING: AIRBNB CASE STUDY

2B9DA103 DATA MINING: AIRBNB CASE STUDY
Introduction
Since from its founding in the year 2008 Airbnb has become 2.6 billion revenue
generating business and a symbol of sharing economy. With the sharing economy Airbnb has
also changed the way people travel and stay in their course of travel. Presently, there are more
than 6 million Airbnb listings worldwide in different cities out of which over 1.9 million
listings of the shared locations are instantly bookable by the customers. Airbnb operates in more
than 180 countries and more than 60,000 cities. At any point of time it is found that the more
than 250 million guests book their stays in Airbnb listed properties throughout the world. With
Airbnb the guests are independent to book or rent any property as listed by the hosts with in
the preference of price for the selected properties. There are minimal factors/pointers that can
be used to compare similar listings in a locality or neighborhood. In case of listing the hosts can
include premium price if they are providing any additional amenities for the guests.
In the following project the data about the different listing in the different areas of the
New York city is analyzed in order to find out the factors that are affecting the business of
Airbnb, trends in the booking and details of the listings available in the dataset.
Dataset features
The data set includes total 16 columns which includes host_name,
neighbourhood_group, neighbourhood ,latitude, longitude, room_type, price, minimum_nights,
number_of_reviews, last_review, reviews_per_month, calculated_host_listings_count,
availability_365.
Analysis and Results of the data set
At first the count of host listings in the different neighborhood groups is compared in the
donut chart.
Introduction
Since from its founding in the year 2008 Airbnb has become 2.6 billion revenue
generating business and a symbol of sharing economy. With the sharing economy Airbnb has
also changed the way people travel and stay in their course of travel. Presently, there are more
than 6 million Airbnb listings worldwide in different cities out of which over 1.9 million
listings of the shared locations are instantly bookable by the customers. Airbnb operates in more
than 180 countries and more than 60,000 cities. At any point of time it is found that the more
than 250 million guests book their stays in Airbnb listed properties throughout the world. With
Airbnb the guests are independent to book or rent any property as listed by the hosts with in
the preference of price for the selected properties. There are minimal factors/pointers that can
be used to compare similar listings in a locality or neighborhood. In case of listing the hosts can
include premium price if they are providing any additional amenities for the guests.
In the following project the data about the different listing in the different areas of the
New York city is analyzed in order to find out the factors that are affecting the business of
Airbnb, trends in the booking and details of the listings available in the dataset.
Dataset features
The data set includes total 16 columns which includes host_name,
neighbourhood_group, neighbourhood ,latitude, longitude, room_type, price, minimum_nights,
number_of_reviews, last_review, reviews_per_month, calculated_host_listings_count,
availability_365.
Analysis and Results of the data set
At first the count of host listings in the different neighborhood groups is compared in the
donut chart.
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

3B9DA103 DATA MINING: AIRBNB CASE STUDY
From the above chart it is evident that maximum listing is available for the Manhattan
neighborhood groups and remaining are listed as Brooklyn, Queens, Bronx and Statan Island.
While analyzing the different attributes the following plot is generated that compares the
minimum nights for different neighborhood groups.
from the above diagram it can be said that, for the Manhattan neighborhood group the count of
minimum nights are high and Statan Island had the lowest count minimum nights.
From the above chart it is evident that maximum listing is available for the Manhattan
neighborhood groups and remaining are listed as Brooklyn, Queens, Bronx and Statan Island.
While analyzing the different attributes the following plot is generated that compares the
minimum nights for different neighborhood groups.
from the above diagram it can be said that, for the Manhattan neighborhood group the count of
minimum nights are high and Statan Island had the lowest count minimum nights.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

4B9DA103 DATA MINING: AIRBNB CASE STUDY
In further investigation, the average price for the properties is compared against the
price and the following pyramid plot is generated using the RapidMiner tool.
Here, it can be easily stated that the lowest average price is recorded for Bronx whereas
highest average value is recorded for Manhattan.
In the next section the availability of the different types of rooms for different
neighborhood group is plotted.
In further investigation, the average price for the properties is compared against the
price and the following pyramid plot is generated using the RapidMiner tool.
Here, it can be easily stated that the lowest average price is recorded for Bronx whereas
highest average value is recorded for Manhattan.
In the next section the availability of the different types of rooms for different
neighborhood group is plotted.

5B9DA103 DATA MINING: AIRBNB CASE STUDY
From the horizontal bar charts, it is evident that Brooklyn has almost similar number of
amount of private room and entire home/apt listings. Manhattan have larger number of entire
home/apt compared to the number of private rooms.
Now the number of reviews for different properties in the different neighborhood groups
are plotted.
Here it is evident that, private rooms are mainly popular from Brooklyn and entire
home/apt are mainly popular from the neighborhood group Manhattan.
Use of decision tree in prediction of rentals
In the next section the prediction of rentals is provided using the decision tree algorithm.
The Decision tree algorithm is considered as the most powerful as well as popular tool in
performing the classification and prediction and prediction. The Decision tree looks like a
flowchart like tree structure. In this structure every node represents test on a specific attribute,
and every branch denotes the outcome of preformed test. Every leaf node at the terminal holds
class label.
This algorithm is related to the family of supervised learning algorithms in data analysis
and machine learning. Compared to other supervised learning algorithms/techniques , decision
trees are used in order to solve regression, classification problems.
From the horizontal bar charts, it is evident that Brooklyn has almost similar number of
amount of private room and entire home/apt listings. Manhattan have larger number of entire
home/apt compared to the number of private rooms.
Now the number of reviews for different properties in the different neighborhood groups
are plotted.
Here it is evident that, private rooms are mainly popular from Brooklyn and entire
home/apt are mainly popular from the neighborhood group Manhattan.
Use of decision tree in prediction of rentals
In the next section the prediction of rentals is provided using the decision tree algorithm.
The Decision tree algorithm is considered as the most powerful as well as popular tool in
performing the classification and prediction and prediction. The Decision tree looks like a
flowchart like tree structure. In this structure every node represents test on a specific attribute,
and every branch denotes the outcome of preformed test. Every leaf node at the terminal holds
class label.
This algorithm is related to the family of supervised learning algorithms in data analysis
and machine learning. Compared to other supervised learning algorithms/techniques , decision
trees are used in order to solve regression, classification problems.
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

6B9DA103 DATA MINING: AIRBNB CASE STUDY
The prime objective of utilizing a decision tree is to create and perform a training model
which can predict the class of a predetermined target variable through the learning process in a
supervised way.
In the Decision Trees, in order to predict a class label for some specific record, the
classification starts from the root. Then the values of root attribute of the tree are compared with
the record’s attribute. Depending upon the comparison, the corresponding branch to the next
node is determined and for the that value to be compared for next node.
In order to analyze the factors that are affecting the rent values the following decision
tree is developed.
The leaf nodes in the fourth level are the predicted prices depending upon the factors
room type, availability, minimum nights, reviews provided by the users per month and finally
number of the host listing in a specific area.
In addition to that following are the factors and their values that are responsible for
deciding the rental value of the different properties that are influencing the
The prime objective of utilizing a decision tree is to create and perform a training model
which can predict the class of a predetermined target variable through the learning process in a
supervised way.
In the Decision Trees, in order to predict a class label for some specific record, the
classification starts from the root. Then the values of root attribute of the tree are compared with
the record’s attribute. Depending upon the comparison, the corresponding branch to the next
node is determined and for the that value to be compared for next node.
In order to analyze the factors that are affecting the rent values the following decision
tree is developed.
The leaf nodes in the fourth level are the predicted prices depending upon the factors
room type, availability, minimum nights, reviews provided by the users per month and finally
number of the host listing in a specific area.
In addition to that following are the factors and their values that are responsible for
deciding the rental value of the different properties that are influencing the
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

7B9DA103 DATA MINING: AIRBNB CASE STUDY
room_type = Entire home/apt
| longitude > -73.964
| | availability_365 > 361.500: 296.527 {count=347}
| | availability_365 ≤ 361.500: 165.173 {count=12111}
| longitude ≤ -73.964
| | availability_365 > 357.500: 476.980 {count=610}
| | availability_365 ≤ 357.500: 242.418 {count=12268}
room_type = Private room
| minimum_nights > 95
| | minimum_nights > 117: 209.804 {count=51}
| | minimum_nights ≤ 117: 3432.625 {count=8}
| minimum_nights ≤ 95
| | longitude > -73.966: 75.133 {count=16230}
| | longitude ≤ -73.966: 123.068 {count=5909}
room_type = Shared room
| calculated_host_listings_count > 2.500
| | reviews_per_month > 0.045: 44.746 {count=551}
| | reviews_per_month ≤ 0.045: 418.500 {count=2}
| calculated_host_listings_count ≤ 2.500
| | number_of_reviews > 1.500: 74.917 {count=336}
| | number_of_reviews ≤ 1.500: 113.757 {count=267}
Recommendations
As it is displayed in the data exploration and analysis section, it is found that compared to
the other neighborhood groups the Manhattan and Brooklyn are mostly generating the revenues
majorly from the properties which are private or entire home/apt. Therefore, it can be stated that
the organization should more invest in the private, entire home/apt so that it can reduce the
operational cost that are required in the shared rooms in the different neighborhood groups.
Using the publicly available dataset about the Airbnb listings in the New York city the
exploratory data analysis carried out. In analysis of the process /phase the primary approaches
are mainly used for building upon predicting decision tree for an optimal price for any new
listing that is to be added to the data stored. The hosts who are going to add a new listing can
eventually use the historical data in order to get the idea about the price the hosts are charging
for similar property listings. In this way, the prediction process adds practical value to the hosts
as they can get idea about the prices with the amenities they are providing to the guests. In this
way it would be possible to improve the revenue from business.
room_type = Entire home/apt
| longitude > -73.964
| | availability_365 > 361.500: 296.527 {count=347}
| | availability_365 ≤ 361.500: 165.173 {count=12111}
| longitude ≤ -73.964
| | availability_365 > 357.500: 476.980 {count=610}
| | availability_365 ≤ 357.500: 242.418 {count=12268}
room_type = Private room
| minimum_nights > 95
| | minimum_nights > 117: 209.804 {count=51}
| | minimum_nights ≤ 117: 3432.625 {count=8}
| minimum_nights ≤ 95
| | longitude > -73.966: 75.133 {count=16230}
| | longitude ≤ -73.966: 123.068 {count=5909}
room_type = Shared room
| calculated_host_listings_count > 2.500
| | reviews_per_month > 0.045: 44.746 {count=551}
| | reviews_per_month ≤ 0.045: 418.500 {count=2}
| calculated_host_listings_count ≤ 2.500
| | number_of_reviews > 1.500: 74.917 {count=336}
| | number_of_reviews ≤ 1.500: 113.757 {count=267}
Recommendations
As it is displayed in the data exploration and analysis section, it is found that compared to
the other neighborhood groups the Manhattan and Brooklyn are mostly generating the revenues
majorly from the properties which are private or entire home/apt. Therefore, it can be stated that
the organization should more invest in the private, entire home/apt so that it can reduce the
operational cost that are required in the shared rooms in the different neighborhood groups.
Using the publicly available dataset about the Airbnb listings in the New York city the
exploratory data analysis carried out. In analysis of the process /phase the primary approaches
are mainly used for building upon predicting decision tree for an optimal price for any new
listing that is to be added to the data stored. The hosts who are going to add a new listing can
eventually use the historical data in order to get the idea about the price the hosts are charging
for similar property listings. In this way, the prediction process adds practical value to the hosts
as they can get idea about the prices with the amenities they are providing to the guests. In this
way it would be possible to improve the revenue from business.

8B9DA103 DATA MINING: AIRBNB CASE STUDY
Conclusion
While considering the average price of the different properties in the neighborhood
groups it is found that Airbnb’s average price/rental for reservations is $80 and mainly the
fastest-growing Airbnb hosts region is the Brooklyn area. The minimum number of nights are
also affecting the rental of the properties listed under the Airbnb and thus it is suggested to
reduce the minimum number of nights for the properties that may lead to better reservation as
well as revenues from the properties that are hosted by Airbnb.
Conclusion
While considering the average price of the different properties in the neighborhood
groups it is found that Airbnb’s average price/rental for reservations is $80 and mainly the
fastest-growing Airbnb hosts region is the Brooklyn area. The minimum number of nights are
also affecting the rental of the properties listed under the Airbnb and thus it is suggested to
reduce the minimum number of nights for the properties that may lead to better reservation as
well as revenues from the properties that are hosted by Airbnb.
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

9B9DA103 DATA MINING: AIRBNB CASE STUDY
Bibliography
[1]H. Wiemer, L. Drowatzky and S. Ihlenfeldt, "Data Mining Methodology for
Engineering Applications (DMME)—A Holistic Extension to the CRISP-DM Model", Applied
Sciences, vol. 9, no. 12, p. 2407, 2019. Available: 10.3390/app9122407.
[2]A. Francis and E. Sullivan, "Exploring Real Data A Look at Airbnb", Math Horizons,
vol. 25, no. 3, pp. 14-17, 2018. Available: 10.1080/10724117.2018.1424459.
[8]D. Jaremen and E. Nawrocka, "AIRBNB COMPETITIVENESS ON THE
HOSPITALITY MARKET SECTOR", Prace Naukowe Uniwersytetu Ekonomicznego we
Wrocławiu, no. 473, pp. 286-296, 2017. Available: 10.15611/pn.2017.473.26.
Bibliography
[1]H. Wiemer, L. Drowatzky and S. Ihlenfeldt, "Data Mining Methodology for
Engineering Applications (DMME)—A Holistic Extension to the CRISP-DM Model", Applied
Sciences, vol. 9, no. 12, p. 2407, 2019. Available: 10.3390/app9122407.
[2]A. Francis and E. Sullivan, "Exploring Real Data A Look at Airbnb", Math Horizons,
vol. 25, no. 3, pp. 14-17, 2018. Available: 10.1080/10724117.2018.1424459.
[8]D. Jaremen and E. Nawrocka, "AIRBNB COMPETITIVENESS ON THE
HOSPITALITY MARKET SECTOR", Prace Naukowe Uniwersytetu Ekonomicznego we
Wrocławiu, no. 473, pp. 286-296, 2017. Available: 10.15611/pn.2017.473.26.
1 out of 10
Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
Copyright © 2020–2025 A2Z Services. All Rights Reserved. Developed and managed by ZUCOL.