Data Handling and Decision Making Report for DAT7001 Module
VerifiedAdded on 2022/09/07
|30
|6264
|25
Report
AI Summary
This report analyzes TripAdvisor's data handling and decision-making processes, covering aspects such as data sources, data flow, data integrity, and ethical considerations. The report examines how TripAdvisor uses data from listings, sales, and user reviews to inform decisions on package development, resource allocation, and sustainability. It proposes improvements like cluster and sentimental analysis to enhance decision-making. The report also addresses data protection, ethical concerns, and the challenges of big data storage. The report further delves into a data analysis using data from restaurants in European cities to determine the relationship between the number of reviews and the ratings. This analysis is aimed at making a critical decision on whether to remove the older reviews to address cost and sustainability issues.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.

Data Handling and Decision Making
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.

Data Handling and Decision Making
Task 1
About TripAdvisor
TripAdvisor is a travels and tours company with its headquarters in the United States of America
(TripAdvisor, 2019). The company is website-based with all interactions with the consumers
being on the platform. The services offered by TripAdvisor of the TripAdvisor website include
destination reviews, destination ratings, travel logistics, destination listings and bookings, and
advertisements (TripAdvisor, 2019). Destinations in this respect refer to hotels and restaurants
with travel logistics referring to cruises, car tours and flights. The company mainly earns from
travel logistics, destination listings and bookings, and advertisements (TripAdvisor, 2019). The
user generated content; destination reviews and destination ratings act as attraction for users who
are both planning to travel or are already travelling. Ratings and reviews act as important
information source, especially for travelers, on which destinations are good and which are not
(Ki-Joon & Choong-Ki, 2015; Conlin, 2009).
Current Data Use
TripAdvisor being a website-based company means that it has access to a huge amount of data,
essentially big data. Many of the website-based companies established at the infant stages of the
internet and website technology did not foresee the potential of the data at that time (Vicenc,
2017). This is largely due to the development in data technology, which was not as advanced as
it is now (Laudon & Guercio, 2014). TripAdvisor is one of these companies being established in
the year 2000 (TripAdvisor, 2019). However, with the data technology being highly developed
currently, the data potential in TripAdvisor is visible and can be utilized. The company has the
2
Task 1
About TripAdvisor
TripAdvisor is a travels and tours company with its headquarters in the United States of America
(TripAdvisor, 2019). The company is website-based with all interactions with the consumers
being on the platform. The services offered by TripAdvisor of the TripAdvisor website include
destination reviews, destination ratings, travel logistics, destination listings and bookings, and
advertisements (TripAdvisor, 2019). Destinations in this respect refer to hotels and restaurants
with travel logistics referring to cruises, car tours and flights. The company mainly earns from
travel logistics, destination listings and bookings, and advertisements (TripAdvisor, 2019). The
user generated content; destination reviews and destination ratings act as attraction for users who
are both planning to travel or are already travelling. Ratings and reviews act as important
information source, especially for travelers, on which destinations are good and which are not
(Ki-Joon & Choong-Ki, 2015; Conlin, 2009).
Current Data Use
TripAdvisor being a website-based company means that it has access to a huge amount of data,
essentially big data. Many of the website-based companies established at the infant stages of the
internet and website technology did not foresee the potential of the data at that time (Vicenc,
2017). This is largely due to the development in data technology, which was not as advanced as
it is now (Laudon & Guercio, 2014). TripAdvisor is one of these companies being established in
the year 2000 (TripAdvisor, 2019). However, with the data technology being highly developed
currently, the data potential in TripAdvisor is visible and can be utilized. The company has the
2

Data Handling and Decision Making
following financial and non-financial data that they use for the decision making process in the
company’s operations:
1. Listings Data: This data contain information of the hotels, restaurants, travel services that
have registered to be listed on the website. The data contains the name of the listing, the
type of listing (hotel, restaurant and travel services), the number of subscriptions, the
package subscribed for and the country or city of origin of the listing.
2. Sales Data: this data contains information on the amount of money generated from the
different revenue sources at TripAdvisor (travel logistics, destination listings and
bookings, and advertisements). Under each of the revenue sources, sales data is available
for each package being provided. The sales data is also available across different periods:
days, months and years as well as across the different geopolitical regions (continents,
countries and cities).
Data Integrity
The quality of data that concerns its accuracy, consistency and maintenance is what is referred to
as data integrity (Lenca & Ferretti, 2018; Pierre, 2011). In terms of both consistency and
maintenance, TripAdvisor performs well. The data on the same type of variables is available for
the period from 2011 to date, which represents an aspect of consistency in the collection and
storage of data. The consistency of data is closely related to the maintenance of data, with the
level of maintenance of the data to some extent informing on the consistency of the data (Karolin
& Schrape, 2018). The accuracy of the data is however questionable especially with respect to
3
following financial and non-financial data that they use for the decision making process in the
company’s operations:
1. Listings Data: This data contain information of the hotels, restaurants, travel services that
have registered to be listed on the website. The data contains the name of the listing, the
type of listing (hotel, restaurant and travel services), the number of subscriptions, the
package subscribed for and the country or city of origin of the listing.
2. Sales Data: this data contains information on the amount of money generated from the
different revenue sources at TripAdvisor (travel logistics, destination listings and
bookings, and advertisements). Under each of the revenue sources, sales data is available
for each package being provided. The sales data is also available across different periods:
days, months and years as well as across the different geopolitical regions (continents,
countries and cities).
Data Integrity
The quality of data that concerns its accuracy, consistency and maintenance is what is referred to
as data integrity (Lenca & Ferretti, 2018; Pierre, 2011). In terms of both consistency and
maintenance, TripAdvisor performs well. The data on the same type of variables is available for
the period from 2011 to date, which represents an aspect of consistency in the collection and
storage of data. The consistency of data is closely related to the maintenance of data, with the
level of maintenance of the data to some extent informing on the consistency of the data (Karolin
& Schrape, 2018). The accuracy of the data is however questionable especially with respect to
3

Data Handling and Decision Making
reviews. How to distinguish an accurate review from a malicious review presents a challenge to
TripAdvisor.
Data Sources-Business Functions Map
The chat in Figure 1: Source-to-Function Map below give the summary of the data from its
source to its use at TripAdvisor.
Figure 1: Source-to-Function Map
Task 2
Data Flow
There are three main stakeholders for TripAdvisor: Content Generating Customers, Consuming
Customers and Investors. The data flows from between TripAdvisor and these stakeholders are
as follows:
4
reviews. How to distinguish an accurate review from a malicious review presents a challenge to
TripAdvisor.
Data Sources-Business Functions Map
The chat in Figure 1: Source-to-Function Map below give the summary of the data from its
source to its use at TripAdvisor.
Figure 1: Source-to-Function Map
Task 2
Data Flow
There are three main stakeholders for TripAdvisor: Content Generating Customers, Consuming
Customers and Investors. The data flows from between TripAdvisor and these stakeholders are
as follows:
4
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.

Data Handling and Decision Making
1. Content Generating Customers: These are customers that visit the TripAdvisor website to
make reviews and ratings of the destinations and only consume the advertisements on the
websites. The data flow between content generating customers and TripAdvisor is
therefore an inflowing data flow. Although this data is not processed, it forms the most
relevant data for TripAdvisor, since those visiting the site for trip advice are particularly
interested in this type of data. The ratings and reviews of products need to be current in
order to provide useful information (Naomi & Heiberger, 2011). This hence implies that
the content generating customers continuously remain relevant to TripAdvisor.
2. Consuming Customers: These are customers that either use Trip Advisor for planning
their travels or list their destinations on TripAdvisor. There is both outflow and inflow of
data between TripAdvisor and Consuming Customers. In the outflow, TripAdvisor
provides the data on reviews and ratings generated by the Content Generating Customers
to the consuming customers that uses TripAdvisor for planning their travels. In the
inflow, the consuming customers provide TripAdvisor with data on the Sales and Listings
described in Current Data Use above. The outflowing data is important as the key aspect
that attracts the consuming customers to the site while the inflowing data is critical in
operational decision-making for TripAdvisor.
3. Investors: These represent the capital owners and/or stakeholders of TripAdvisor.
TripAdvisor is listed in the NASDAQ hence those that own shares of TripAdvisor are the
Investors (TripAdvisor, 2019). The data flow between TripAdvisor and its investors is
mainly an outflow. The data is presented to the investors as reports, hence processing is
necessary for this data flow.
5
1. Content Generating Customers: These are customers that visit the TripAdvisor website to
make reviews and ratings of the destinations and only consume the advertisements on the
websites. The data flow between content generating customers and TripAdvisor is
therefore an inflowing data flow. Although this data is not processed, it forms the most
relevant data for TripAdvisor, since those visiting the site for trip advice are particularly
interested in this type of data. The ratings and reviews of products need to be current in
order to provide useful information (Naomi & Heiberger, 2011). This hence implies that
the content generating customers continuously remain relevant to TripAdvisor.
2. Consuming Customers: These are customers that either use Trip Advisor for planning
their travels or list their destinations on TripAdvisor. There is both outflow and inflow of
data between TripAdvisor and Consuming Customers. In the outflow, TripAdvisor
provides the data on reviews and ratings generated by the Content Generating Customers
to the consuming customers that uses TripAdvisor for planning their travels. In the
inflow, the consuming customers provide TripAdvisor with data on the Sales and Listings
described in Current Data Use above. The outflowing data is important as the key aspect
that attracts the consuming customers to the site while the inflowing data is critical in
operational decision-making for TripAdvisor.
3. Investors: These represent the capital owners and/or stakeholders of TripAdvisor.
TripAdvisor is listed in the NASDAQ hence those that own shares of TripAdvisor are the
Investors (TripAdvisor, 2019). The data flow between TripAdvisor and its investors is
mainly an outflow. The data is presented to the investors as reports, hence processing is
necessary for this data flow.
5

Data Handling and Decision Making
Proposed Improvements
Introducing cluster analysis for both listing and sales data would improve on the decision-
making at TripAdvisor. Cluster analysis is a technique in data analysis that groups items
(subjects or observations) in a dataset depending on the level of homogeneity of the items (Yu, et
al., 2011; Malki & Rizk, 2016; Ren & Ying, 2010). Clustering the listings data would provide
information on which restaurants, hotels and travel services have similarities in listings and
hence aid in package development, with each cluster of restaurants, hotels or travel services
having a customized package(s). The ratings and reviews data need to be collected, stored and
processed since it has potential for providing information that would also improve on the
decision making.
Data Integrity, Protection and Ethics
The main data integrity concern remains on the reviews; this would be solved by using
sentimental analysis of the reviews and summarizing the review with a single adjective.
Sentimental analysis is a data analysis technique that draws key opinions as themes from textual
data (Korkontzelos & Nikfarjam, 2016). This would be guided by rules on limit of number of
positive or negative words that flags a malicious review, making them more accurate. To ensure
data protection and ethics, the authors of the reviews need to be made aware that their reviews
are going to be analyzed and used for drawing inferences. The same apply for the data collected
from the consuming customers on sales and listings for the proposed cluster analysis approach.
6
Proposed Improvements
Introducing cluster analysis for both listing and sales data would improve on the decision-
making at TripAdvisor. Cluster analysis is a technique in data analysis that groups items
(subjects or observations) in a dataset depending on the level of homogeneity of the items (Yu, et
al., 2011; Malki & Rizk, 2016; Ren & Ying, 2010). Clustering the listings data would provide
information on which restaurants, hotels and travel services have similarities in listings and
hence aid in package development, with each cluster of restaurants, hotels or travel services
having a customized package(s). The ratings and reviews data need to be collected, stored and
processed since it has potential for providing information that would also improve on the
decision making.
Data Integrity, Protection and Ethics
The main data integrity concern remains on the reviews; this would be solved by using
sentimental analysis of the reviews and summarizing the review with a single adjective.
Sentimental analysis is a data analysis technique that draws key opinions as themes from textual
data (Korkontzelos & Nikfarjam, 2016). This would be guided by rules on limit of number of
positive or negative words that flags a malicious review, making them more accurate. To ensure
data protection and ethics, the authors of the reviews need to be made aware that their reviews
are going to be analyzed and used for drawing inferences. The same apply for the data collected
from the consuming customers on sales and listings for the proposed cluster analysis approach.
6

Data Handling and Decision Making
Task 3
Decisions on which packages/subscriptions to keep and which to abandon are currently informed
by the data on the listings. The trend analysis on this data reveal information on the best
performing subscriptions which are then kept and the worst performing ones which either need
improvement or are removed. Another key decision for TripAdvisor is resource utilization, with
regards to employee remuneration, operations expansion and breaking into new markets, is
informed by the predictive analysis of the sales data. The predictive analysis gives information
on projected profits for TripAdvisor hence informing decisions about future resource utilization
of the company.
Although big data is associated with improved decision-making, there is usually a question on
the “shelf life” of big data. Can big data be too big? And is big data always useful data? The
storage and maintenance of big data is becoming an issue of concern firstly due to the cost and
secondly due to sustainability (Baack, 2015). Currently, many companies contract cloud services
for the storage of their data, which despite being cheaper than having a personal storage
infrastructure, is expensive in the long-term. The cloud services require regular subscriptions
with the most secure sites being costly (De Jong-Chen, 2015). Sustainability is also an issue;
storage of data requires a lot of cooling of the storage infrastructure. Even with cloud services
being used, the company offering the cloud service still uses a lot of energy for cooling (Ruth &
Brynhildur, 2017). Many companies are moving towards more sustainable approaches to
operating and hence the excess use of energy involved in the storage of data raises questions on
sustainability.
This brings us back to the issue of big data being too big. TripAdvisor is currently faced with the
decision on how much data to continue keeping considering cost and sustainability. The largest
7
Task 3
Decisions on which packages/subscriptions to keep and which to abandon are currently informed
by the data on the listings. The trend analysis on this data reveal information on the best
performing subscriptions which are then kept and the worst performing ones which either need
improvement or are removed. Another key decision for TripAdvisor is resource utilization, with
regards to employee remuneration, operations expansion and breaking into new markets, is
informed by the predictive analysis of the sales data. The predictive analysis gives information
on projected profits for TripAdvisor hence informing decisions about future resource utilization
of the company.
Although big data is associated with improved decision-making, there is usually a question on
the “shelf life” of big data. Can big data be too big? And is big data always useful data? The
storage and maintenance of big data is becoming an issue of concern firstly due to the cost and
secondly due to sustainability (Baack, 2015). Currently, many companies contract cloud services
for the storage of their data, which despite being cheaper than having a personal storage
infrastructure, is expensive in the long-term. The cloud services require regular subscriptions
with the most secure sites being costly (De Jong-Chen, 2015). Sustainability is also an issue;
storage of data requires a lot of cooling of the storage infrastructure. Even with cloud services
being used, the company offering the cloud service still uses a lot of energy for cooling (Ruth &
Brynhildur, 2017). Many companies are moving towards more sustainable approaches to
operating and hence the excess use of energy involved in the storage of data raises questions on
sustainability.
This brings us back to the issue of big data being too big. TripAdvisor is currently faced with the
decision on how much data to continue keeping considering cost and sustainability. The largest
7
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Data Handling and Decision Making
data stored by TripAdvisor is the data on reviews collected from the content generating
customers. The reviews are textual data on the comments and opinions about a destination or
travel service. Hence, the question is how far back should reviews date to for them to still be
relevant for a destination or travel service? Current reviews make more sense if it is assumed that
the destinations and travel services are at constant state of improvement. However, this is just the
ideal case; the reality may be that this assumption is not true for all destinations and travel
services. Thus, older reviews become just as relevant as newer ones.
This decision is important not only for cost effectiveness, but also because it would move
TripAdvisor to sustainability status. The company, being website-based may seem to be involved
to a very limited extent with sustainability; however, energy sustainability in its data storage
presents a situation where a sustainable approach can be adopted. Currently, from TripAdvisor
(2009) the reviews total to 830 million reviews on the website which is a very huge number
representing a significantly large amount of data. The move to be more sustainable will be more
appealing to customers who off late are more concerned about the carbon footprints of products
and services. The manner in which this is approached remains the critical decision point for
TripAdvisor.
Task 4
About Data
In order to enable decision making for the critical decision point for TripAdvisor, it is important
to understand the relationship that exists between the number of reviews and the ratings in
different places around the world. Understanding this relationship will inform on whether
8
data stored by TripAdvisor is the data on reviews collected from the content generating
customers. The reviews are textual data on the comments and opinions about a destination or
travel service. Hence, the question is how far back should reviews date to for them to still be
relevant for a destination or travel service? Current reviews make more sense if it is assumed that
the destinations and travel services are at constant state of improvement. However, this is just the
ideal case; the reality may be that this assumption is not true for all destinations and travel
services. Thus, older reviews become just as relevant as newer ones.
This decision is important not only for cost effectiveness, but also because it would move
TripAdvisor to sustainability status. The company, being website-based may seem to be involved
to a very limited extent with sustainability; however, energy sustainability in its data storage
presents a situation where a sustainable approach can be adopted. Currently, from TripAdvisor
(2009) the reviews total to 830 million reviews on the website which is a very huge number
representing a significantly large amount of data. The move to be more sustainable will be more
appealing to customers who off late are more concerned about the carbon footprints of products
and services. The manner in which this is approached remains the critical decision point for
TripAdvisor.
Task 4
About Data
In order to enable decision making for the critical decision point for TripAdvisor, it is important
to understand the relationship that exists between the number of reviews and the ratings in
different places around the world. Understanding this relationship will inform on whether
8

Data Handling and Decision Making
removing the reviews that date back beyond a certain point will affect the ratings of the
destinations in different locations. This information will reveal the consequences of removing or
keeping the reviews for different locations.
The data to be used in the data analysis for informing on the decision point was obtained from
(Damien, 2017). The collection process of the data by the Damien (2017) involved a data mining
technique known as web scrapping. Web scrapping is a technique of collection of data from
internet platforms such as normal websites and social media sites (Galit, et al., 2018). In Damien
(2017), the data is continuously scrapped from the TripAdvisor website hence also providing
current data. A total of 125 527 observations of restaurants in 31 European cities were made on
the following business-related variables in Table 1: Variable Summary Description below.
Table 1: Variable Summary Description
Name of Variable Nature of Variable Type of Variable Measurement Scale
Name (of Restaurant) Categorical Data
Variable
- Nominal Scale
Rating Numerical Data
Variable
Dependent Variable Ratio Scale
Reviews (Sample) Text Variable - -
Price Range Numerical Data
Variable
- Ratio Scale
City (where the
restaurant is located)
Categorical Data
Variable
Independent Variable Nominal Scale
Number of Reviews Numerical Data
Variable
Independent Variable Ratio Scale
9
removing the reviews that date back beyond a certain point will affect the ratings of the
destinations in different locations. This information will reveal the consequences of removing or
keeping the reviews for different locations.
The data to be used in the data analysis for informing on the decision point was obtained from
(Damien, 2017). The collection process of the data by the Damien (2017) involved a data mining
technique known as web scrapping. Web scrapping is a technique of collection of data from
internet platforms such as normal websites and social media sites (Galit, et al., 2018). In Damien
(2017), the data is continuously scrapped from the TripAdvisor website hence also providing
current data. A total of 125 527 observations of restaurants in 31 European cities were made on
the following business-related variables in Table 1: Variable Summary Description below.
Table 1: Variable Summary Description
Name of Variable Nature of Variable Type of Variable Measurement Scale
Name (of Restaurant) Categorical Data
Variable
- Nominal Scale
Rating Numerical Data
Variable
Dependent Variable Ratio Scale
Reviews (Sample) Text Variable - -
Price Range Numerical Data
Variable
- Ratio Scale
City (where the
restaurant is located)
Categorical Data
Variable
Independent Variable Nominal Scale
Number of Reviews Numerical Data
Variable
Independent Variable Ratio Scale
9

Data Handling and Decision Making
Data Preparation
The data preparation involved variable reduction and filtering out of null entries, both executed
in excel. Variable reduction or dimension reduction is a method of minimizing the number of
variables in a dataset to remain with the most relevant or useful variables for a study (Howitt &
Cramer, 2010). The initial dataset from Damien (2017) contained the following 11 variables;
Index, Name, Cuisine Style, City, Rating, Ranking, Number of Reviews, Price Range, Reviews,
ID_TA and URL_TA. In the dimension reduction the number of variable was reduced from 11 to
3 which were used for the analysis in the research; City, Rating and Number of Reviews. Two
extra variables, Name and Ranking, were retained in the dataset as identifiers, bringing the total
of variables in the final dataset to 5. In data analysis null entries broadly refers to observations
that have missing entries for one or more of the variables (Chambers, 2017). Null entries
contribute to miss interpretation of data characteristics and hence have to be corrected for the
case of small datasets and removed for the case of large datasets (Freedman, 2009). Given the
initial data from Damien (2017) was a large dataset, the missing entries were removed resulting
in a final dataset with 108 155 observations.
Limitations
The data from Damien (2017) considers a sample of restaurants in European cities, this excludes
other service type that are not restaurants and not located in Europe. Therefore, the
generalizability of the inferences from the analysis in this study may be limited as compared to
an instance where a global random sample was considered.
10
Data Preparation
The data preparation involved variable reduction and filtering out of null entries, both executed
in excel. Variable reduction or dimension reduction is a method of minimizing the number of
variables in a dataset to remain with the most relevant or useful variables for a study (Howitt &
Cramer, 2010). The initial dataset from Damien (2017) contained the following 11 variables;
Index, Name, Cuisine Style, City, Rating, Ranking, Number of Reviews, Price Range, Reviews,
ID_TA and URL_TA. In the dimension reduction the number of variable was reduced from 11 to
3 which were used for the analysis in the research; City, Rating and Number of Reviews. Two
extra variables, Name and Ranking, were retained in the dataset as identifiers, bringing the total
of variables in the final dataset to 5. In data analysis null entries broadly refers to observations
that have missing entries for one or more of the variables (Chambers, 2017). Null entries
contribute to miss interpretation of data characteristics and hence have to be corrected for the
case of small datasets and removed for the case of large datasets (Freedman, 2009). Given the
initial data from Damien (2017) was a large dataset, the missing entries were removed resulting
in a final dataset with 108 155 observations.
Limitations
The data from Damien (2017) considers a sample of restaurants in European cities, this excludes
other service type that are not restaurants and not located in Europe. Therefore, the
generalizability of the inferences from the analysis in this study may be limited as compared to
an instance where a global random sample was considered.
10
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.

Data Handling and Decision Making
Task 5 & Task 6
Statistical Methods
The analysis part in this study involved the application of three analyses techniques: One-way
ANOVA, Correlation Analysis and Regression Analysis. One-way ANOVA is an analysis
technique that applies for analyzing the existence of significance groups’ difference among for a
categorical variable, when a numerical variable is considered as the factor of comparison (Kim,
2014). Correlation analysis is a statistical technique that is applied for analyzing the relationship
between two numerical variables (Kirk, 2016). Regression analysis is an analysis technique that
uses mathematical equations as a way of representing the nature of the relationship between
variables (Jaulin, 2010).
There was need to observe separate and joint relationship that Number of Reviews and City have
with Rating so as to provide more information. The One-way ANOVA is used for observing the
relationship between City and Rating variables to determine whether the cities significantly
differ when it comes to the ratings and number of reviews. The Correlation analysis is used for
observing the association between Rating and Number of Reviews variables to determine
whether there is a significant association between Number of Reviews and Ratings. The
Regression Analysis is used to observe the joint association that Number of Reviews and City
have with Rating to determine how the number of reviews about a destination and the location of
the destination jointly relate to the rating the destination when considered together.
Descriptive Statistics
11
Task 5 & Task 6
Statistical Methods
The analysis part in this study involved the application of three analyses techniques: One-way
ANOVA, Correlation Analysis and Regression Analysis. One-way ANOVA is an analysis
technique that applies for analyzing the existence of significance groups’ difference among for a
categorical variable, when a numerical variable is considered as the factor of comparison (Kim,
2014). Correlation analysis is a statistical technique that is applied for analyzing the relationship
between two numerical variables (Kirk, 2016). Regression analysis is an analysis technique that
uses mathematical equations as a way of representing the nature of the relationship between
variables (Jaulin, 2010).
There was need to observe separate and joint relationship that Number of Reviews and City have
with Rating so as to provide more information. The One-way ANOVA is used for observing the
relationship between City and Rating variables to determine whether the cities significantly
differ when it comes to the ratings and number of reviews. The Correlation analysis is used for
observing the association between Rating and Number of Reviews variables to determine
whether there is a significant association between Number of Reviews and Ratings. The
Regression Analysis is used to observe the joint association that Number of Reviews and City
have with Rating to determine how the number of reviews about a destination and the location of
the destination jointly relate to the rating the destination when considered together.
Descriptive Statistics
11

Data Handling and Decision Making
Table 2: City Frequencies below gives the frequencies of the destinations for each of the 31 cities
with the corresponding bar graph as given in Figure 2: City Frequencies below. From both the
Table 2: City Frequencies and the Figure 2: City Frequencies below show that at 14.2% and
12.3% respectively, London and Paris lead in the highest number of destinations reviewed on
TripAdvisor in Europe. In addition, at 0.4% and 0.5% respectively, Ljubljana and Luxembourg
have the lowest number of destinations reviewed on TripAdvisor in Europe.
12
Table 2: City Frequencies below gives the frequencies of the destinations for each of the 31 cities
with the corresponding bar graph as given in Figure 2: City Frequencies below. From both the
Table 2: City Frequencies and the Figure 2: City Frequencies below show that at 14.2% and
12.3% respectively, London and Paris lead in the highest number of destinations reviewed on
TripAdvisor in Europe. In addition, at 0.4% and 0.5% respectively, Ljubljana and Luxembourg
have the lowest number of destinations reviewed on TripAdvisor in Europe.
12

Data Handling and Decision Making
Table 2: City Frequencies
13
Table 2: City Frequencies
13
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Data Handling and Decision Making
Figure 2: City Frequencies
Table 3: Descriptive Analysis Ratings below shows the average rating for restaurants across
Europe on TripAdvisor. This value is given as 3.989. The frequency histogram for the Ratings
variable is shown in Figure 3: Histogram for Ratings below. From the plot, we observe that the
Ratings is slightly skewed to the left.
Table 3: Descriptive Analysis Ratings
Descriptive Statistics
N Minimum Maximum Mean
Rating 108155 1.0 5.0 3.989
Valid N (list wise) 108155
14
Figure 2: City Frequencies
Table 3: Descriptive Analysis Ratings below shows the average rating for restaurants across
Europe on TripAdvisor. This value is given as 3.989. The frequency histogram for the Ratings
variable is shown in Figure 3: Histogram for Ratings below. From the plot, we observe that the
Ratings is slightly skewed to the left.
Table 3: Descriptive Analysis Ratings
Descriptive Statistics
N Minimum Maximum Mean
Rating 108155 1.0 5.0 3.989
Valid N (list wise) 108155
14

Data Handling and Decision Making
Figure 3: Histogram for Ratings
Table 4: Descriptive Statistics Number of Reviews below presents the statistics on the standard
deviation and mean for the number of reviews for restaurants across Europe on TripAdvisor. The
average number of reviews is given as 125 (rounded off to the nearest whole number) with the
standard deviation given as 310.867. The frequency histogram for the Number of Reviews
variable is shown in Figure 4: Histogram for Number of Reviews below. From the plot, we
observe that the distribution of the data on the Number of Reviews is skewed to the right.
Table 4: Descriptive Statistics Number of Reviews
Descriptive Statistics
N Minimum Maximum Mean Std. Deviation
Number Of Reviews 108155 2 16478 125.22 310.867
Valid N (list wise) 108155
15
Figure 3: Histogram for Ratings
Table 4: Descriptive Statistics Number of Reviews below presents the statistics on the standard
deviation and mean for the number of reviews for restaurants across Europe on TripAdvisor. The
average number of reviews is given as 125 (rounded off to the nearest whole number) with the
standard deviation given as 310.867. The frequency histogram for the Number of Reviews
variable is shown in Figure 4: Histogram for Number of Reviews below. From the plot, we
observe that the distribution of the data on the Number of Reviews is skewed to the right.
Table 4: Descriptive Statistics Number of Reviews
Descriptive Statistics
N Minimum Maximum Mean Std. Deviation
Number Of Reviews 108155 2 16478 125.22 310.867
Valid N (list wise) 108155
15

Data Handling and Decision Making
Figure 4: Histogram for Number of Reviews
The plot in Figure 5: Scatterplot City against Mean Number of Reviews gives the graph for the
comparison of the average number of reviews for each of the 31 cities across Europe on
TripAdvisor. From the plot, we observe that Rome and Edinburgh had the highest and second
highest average number of reviews. Hamburg and Bratislava had the lowest and the second
lowest average number of reviews.
16
Figure 4: Histogram for Number of Reviews
The plot in Figure 5: Scatterplot City against Mean Number of Reviews gives the graph for the
comparison of the average number of reviews for each of the 31 cities across Europe on
TripAdvisor. From the plot, we observe that Rome and Edinburgh had the highest and second
highest average number of reviews. Hamburg and Bratislava had the lowest and the second
lowest average number of reviews.
16
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.

Data Handling and Decision Making
Figure 5: Scatterplot City against Mean Number of Reviews
The plot in Figure 6: Scatterplot City against Mean Rating gives the graph for the comparison of
the average rating for each of the 31 cities across Europe on TripAdvisor. From the plot, we
observe that in general the average rating range between 3.8 and 4.2 for all cities across Europe.
17
Figure 5: Scatterplot City against Mean Number of Reviews
The plot in Figure 6: Scatterplot City against Mean Rating gives the graph for the comparison of
the average rating for each of the 31 cities across Europe on TripAdvisor. From the plot, we
observe that in general the average rating range between 3.8 and 4.2 for all cities across Europe.
17

Data Handling and Decision Making
Figure 6: Scatterplot City against Mean Rating
One-way ANOVA analysis
The summarized analysis results in Table 5: One-way ANOVA Test 1 below represent the
results of the One-way ANOVA analysis for difference in the Rating variable among the cities in
the City variable. The hypothesis test for this case is as given below:
Null Hypothesis: There does not exist a statistically significant difference between the cities with
respect to the rating.
Alternative Hypothesis: There exists is a statistically significant difference between the cities
with respect to the rating.
18
Figure 6: Scatterplot City against Mean Rating
One-way ANOVA analysis
The summarized analysis results in Table 5: One-way ANOVA Test 1 below represent the
results of the One-way ANOVA analysis for difference in the Rating variable among the cities in
the City variable. The hypothesis test for this case is as given below:
Null Hypothesis: There does not exist a statistically significant difference between the cities with
respect to the rating.
Alternative Hypothesis: There exists is a statistically significant difference between the cities
with respect to the rating.
18

Data Handling and Decision Making
Table 5: One-way ANOVA Test 1
ANOVA
Rating
Sum of Squares df Mean Square F Sig.
Between Groups 1290.900 30 43.030 113.743 .000
Within Groups 40904.138 108124 .378
Total 42195.038 108154
From Table 5: One-way ANOVA Test 1 above, we observe that the Test Statistic, F-Statistic =
113.743 and from the sig. column the p-value = 0.000. Considering an α significance level of
0.05, then the p-value = 0.000 is less than the α significance level. We therefore reject the null
hypothesis and conclude that there exists a statistically significant difference between the cities
with respect to the rating.
The summarized analysis results in Table 6: One-way ANOVA Test 2 below represent the
results of the One-way ANOVA analysis for difference among the cities in the City variable with
regards to the Number of Reviews variable. The hypothesis test for this case is as given below:
Null Hypothesis: There does not exist a statistically significant difference between the cities with
respect to the number of reviews.
Alternative Hypothesis: There exists a statistically significant difference between the cities with
respect to the number of reviews.
19
Table 5: One-way ANOVA Test 1
ANOVA
Rating
Sum of Squares df Mean Square F Sig.
Between Groups 1290.900 30 43.030 113.743 .000
Within Groups 40904.138 108124 .378
Total 42195.038 108154
From Table 5: One-way ANOVA Test 1 above, we observe that the Test Statistic, F-Statistic =
113.743 and from the sig. column the p-value = 0.000. Considering an α significance level of
0.05, then the p-value = 0.000 is less than the α significance level. We therefore reject the null
hypothesis and conclude that there exists a statistically significant difference between the cities
with respect to the rating.
The summarized analysis results in Table 6: One-way ANOVA Test 2 below represent the
results of the One-way ANOVA analysis for difference among the cities in the City variable with
regards to the Number of Reviews variable. The hypothesis test for this case is as given below:
Null Hypothesis: There does not exist a statistically significant difference between the cities with
respect to the number of reviews.
Alternative Hypothesis: There exists a statistically significant difference between the cities with
respect to the number of reviews.
19
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Data Handling and Decision Making
Table 6: One-way ANOVA Test 2
ANOVA
Number Of Reviews
Sum of Squares df Mean Square F Sig.
Between Groups 298340981.017 30 9944699.367 105.900 .000
Within Groups 10153504737.384 108124 93906.115
Total 10451845718.401 108154
From Table 6: One-way ANOVA Test 2 above, we observe that the Test Statistic, F-Statistic =
105.900 and from the sig. column the p-value = 0.000. Considering an α significance level of
0.05, then the p-value = 0.000 is less than the α significance level. We therefore reject the null
hypothesis and conclude that there is a statistically significant difference between the cities with
respect to the number of reviews.
Correlation Analysis
The summarized analysis results in Table 7: Correlation Analysis below represent the results for
the correlation analysis of the association between the Rating and the Number of Reviews. Two
hypotheses are tested in this analysis
Hypothesis 1
Null Hypothesis: There does not exist an association between rating and number of reviews.
Alternative Hypothesis: There exists an association between rating and number of reviews.
20
Table 6: One-way ANOVA Test 2
ANOVA
Number Of Reviews
Sum of Squares df Mean Square F Sig.
Between Groups 298340981.017 30 9944699.367 105.900 .000
Within Groups 10153504737.384 108124 93906.115
Total 10451845718.401 108154
From Table 6: One-way ANOVA Test 2 above, we observe that the Test Statistic, F-Statistic =
105.900 and from the sig. column the p-value = 0.000. Considering an α significance level of
0.05, then the p-value = 0.000 is less than the α significance level. We therefore reject the null
hypothesis and conclude that there is a statistically significant difference between the cities with
respect to the number of reviews.
Correlation Analysis
The summarized analysis results in Table 7: Correlation Analysis below represent the results for
the correlation analysis of the association between the Rating and the Number of Reviews. Two
hypotheses are tested in this analysis
Hypothesis 1
Null Hypothesis: There does not exist an association between rating and number of reviews.
Alternative Hypothesis: There exists an association between rating and number of reviews.
20

Data Handling and Decision Making
Hypothesis 2
Null Hypothesis: There does not exist a significant association between rating and number of
reviews.
Alternative Hypothesis: There exists a significant association between rating and number of
reviews.
Table 7: Correlation Analysis
Correlations
Rating Number Of
Reviews
Rating
Pearson Correlation 1 .034**
Sig. (2-tailed) .000
N 108155 108155
Number Of Reviews
Pearson Correlation .034** 1
Sig. (2-tailed) .000
N 108155 108155
**. Correlation is significant at the 0.01 level (2-tailed).
From Table 7: Correlation Analysis above we observe from the Sig. (2-tailed) rows the p-value =
0.000. Considering an α significance level of 0.05, then the p-value = 0.000 is less than the α
significance level. We therefore reject the null hypothesis in Hypothesis 1 and conclude that
there exist a relationship between rating and number of reviews. This is also visible from the
value of the Pearson Correlation Coefficient = 0.034. A correlation value = 0 indicates a lack of
association between the variables of interest (O'Neil & Schutt, 2013). Given that in this instance
the Pearson Correlation Coefficient = 0.034, then we reject the null hypothesis in Hypothesis 1
and conclude that there exist an association between rating and number of reviews.
Also from Table 7: Correlation Analysis above we note that the test statistic, the Pearson
Correlation Coefficient = 0.034. The value of the Pearson Correlation Coefficient equal to -1 and
21
Hypothesis 2
Null Hypothesis: There does not exist a significant association between rating and number of
reviews.
Alternative Hypothesis: There exists a significant association between rating and number of
reviews.
Table 7: Correlation Analysis
Correlations
Rating Number Of
Reviews
Rating
Pearson Correlation 1 .034**
Sig. (2-tailed) .000
N 108155 108155
Number Of Reviews
Pearson Correlation .034** 1
Sig. (2-tailed) .000
N 108155 108155
**. Correlation is significant at the 0.01 level (2-tailed).
From Table 7: Correlation Analysis above we observe from the Sig. (2-tailed) rows the p-value =
0.000. Considering an α significance level of 0.05, then the p-value = 0.000 is less than the α
significance level. We therefore reject the null hypothesis in Hypothesis 1 and conclude that
there exist a relationship between rating and number of reviews. This is also visible from the
value of the Pearson Correlation Coefficient = 0.034. A correlation value = 0 indicates a lack of
association between the variables of interest (O'Neil & Schutt, 2013). Given that in this instance
the Pearson Correlation Coefficient = 0.034, then we reject the null hypothesis in Hypothesis 1
and conclude that there exist an association between rating and number of reviews.
Also from Table 7: Correlation Analysis above we note that the test statistic, the Pearson
Correlation Coefficient = 0.034. The value of the Pearson Correlation Coefficient equal to -1 and
21

Data Handling and Decision Making
1 indicate strongest negative correlation and strongest positive correlation respectively (Shaffer,
2011). The correlation coefficient values close to 0 zero indicate weak assocoiation while those
closer to either -1 or 1 indicate strong association (Oscar, 2009). In this case, the value of the
Pearson Correlation Coefficient = 0.034 implying a weak positive association. Therefore, we fail
to reject the null hypothesis in Hypothesis 2 and conclude that there does not exist a significant
association between rating and number of reviews.
Regression Analysis
From Table 1: Variable Summary Description above, we note that Rating, the response variable
(dependent variable) for this study is numerical in nature and there are two independent
variables. Hence, we apply the multiple linear regression analysis for this study. This form of
regression analysis is a technique in which there are two or more independent variables
composed of either numerical data variables or categorical data variables and a single dependent
data variable that is numerical in nature (Tri & Jugal, 2015).
The results in Table 8: Regression Results: Regression Variables, , Table 10: Regression Results:
ANOVA and Table 11: Regression Results: Regression Coefficients provide the full regression
analysis results for the multiple linear regression for this study for the variables initialized in
Table 8: Regression Results: Regression Variables below.
22
1 indicate strongest negative correlation and strongest positive correlation respectively (Shaffer,
2011). The correlation coefficient values close to 0 zero indicate weak assocoiation while those
closer to either -1 or 1 indicate strong association (Oscar, 2009). In this case, the value of the
Pearson Correlation Coefficient = 0.034 implying a weak positive association. Therefore, we fail
to reject the null hypothesis in Hypothesis 2 and conclude that there does not exist a significant
association between rating and number of reviews.
Regression Analysis
From Table 1: Variable Summary Description above, we note that Rating, the response variable
(dependent variable) for this study is numerical in nature and there are two independent
variables. Hence, we apply the multiple linear regression analysis for this study. This form of
regression analysis is a technique in which there are two or more independent variables
composed of either numerical data variables or categorical data variables and a single dependent
data variable that is numerical in nature (Tri & Jugal, 2015).
The results in Table 8: Regression Results: Regression Variables, , Table 10: Regression Results:
ANOVA and Table 11: Regression Results: Regression Coefficients provide the full regression
analysis results for the multiple linear regression for this study for the variables initialized in
Table 8: Regression Results: Regression Variables below.
22
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.

Data Handling and Decision Making
Table 8: Regression Results: Regression Variables
Table 9: Regression Results: Model Summary
Table 10: Regression Results: ANOVA
Table 11: Regression Results: Regression Coefficients
The results in above provide information on the fitness of the regression model. We observe that
the value of the Adjusted R Squared = 0.002, which is equivalent to 0.2% model fitness. 60% is
23
Table 8: Regression Results: Regression Variables
Table 9: Regression Results: Model Summary
Table 10: Regression Results: ANOVA
Table 11: Regression Results: Regression Coefficients
The results in above provide information on the fitness of the regression model. We observe that
the value of the Adjusted R Squared = 0.002, which is equivalent to 0.2% model fitness. 60% is
23

Data Handling and Decision Making
considered as the cutoff point beyond which a model is taken as fit (Witten, 2011). The 0.2%
model fitness for this model is significantly low and hence the regression model is not fit for
explaining the association between the variables. Hence the results given in Table 11: Regression
Results: Regression Coefficients cannot be used to form the equal for the regression model
explaining the association between the response variable (dependent variable); Rating, and the
explanatory variables (independent variables); Number of Reviews and City
The results in Table 10: Regression Results: ANOVA however represent the joint effect of City
and Number of Reviews on the Rating. The results in the table are independent of the results in
Table 11: Regression Results: Regression Coefficients above and hence not subject to the 0.25
fitness given in above. This implies that these results can be taken separately for joint
relationship analysis. The hypothesis for this case is:
Null Hypothesis: The City and Number of Reviews variables jointly do not have a significant
association with the Rating variable.
Alternative Hypothesis: The City and Number of Reviews variables jointly have a significant
association with the Rating variable.
From the results in Table 10: Regression Results: ANOVA above, we observe the Test-Statistic.
F-Test = 120.242 and from the Sig. column, p-value = 0.000. Considering an α significance level
of 0.05, then the p-value = 0.000 is less than the α significance level. We therefore reject the null
hypothesis in and conclude that the City and Number of Reviews variables jointly have a
significant association with the Rating variable.
Discussion
The analyses in this study reveal the following:
24
considered as the cutoff point beyond which a model is taken as fit (Witten, 2011). The 0.2%
model fitness for this model is significantly low and hence the regression model is not fit for
explaining the association between the variables. Hence the results given in Table 11: Regression
Results: Regression Coefficients cannot be used to form the equal for the regression model
explaining the association between the response variable (dependent variable); Rating, and the
explanatory variables (independent variables); Number of Reviews and City
The results in Table 10: Regression Results: ANOVA however represent the joint effect of City
and Number of Reviews on the Rating. The results in the table are independent of the results in
Table 11: Regression Results: Regression Coefficients above and hence not subject to the 0.25
fitness given in above. This implies that these results can be taken separately for joint
relationship analysis. The hypothesis for this case is:
Null Hypothesis: The City and Number of Reviews variables jointly do not have a significant
association with the Rating variable.
Alternative Hypothesis: The City and Number of Reviews variables jointly have a significant
association with the Rating variable.
From the results in Table 10: Regression Results: ANOVA above, we observe the Test-Statistic.
F-Test = 120.242 and from the Sig. column, p-value = 0.000. Considering an α significance level
of 0.05, then the p-value = 0.000 is less than the α significance level. We therefore reject the null
hypothesis in and conclude that the City and Number of Reviews variables jointly have a
significant association with the Rating variable.
Discussion
The analyses in this study reveal the following:
24

Data Handling and Decision Making
1. At 14.2% and 12.3% respectively, London and Paris lead in the highest number of
destinations reviewed on TripAdvisor in Europe. This points to a bigger investment into
the restaurants business in both London and Paris as compared to other the investment in
other cities in Europe. This may also point to the two cities being the most visited in
Europe hence restaurants in the two locations are more likely to be reviewed.
2. At 0.4% and 0.5% respectively, Ljubljana and Luxembourg have the lowest number of
destinations reviewed on TripAdvisor in Europe. On the other end, this points to a lower
investment into the restaurants business in both Ljubljana and Luxembourg as compared
to other the investment in other cities in Europe. This may also point to the two cities
being the least visited in Europe hence restaurants in the two location are less likely to be
reviewed.
3. The average rating for restaurants across Europe on TripAdvisor is 3.989. This is a
significantly high rating implying that on average European restaurants are found to be
close to excellent.
4. The average number of reviews for restaurants across Europe is 125 reviews.
5. Rome and Edinburgh had the highest and second highest average number of reviews.
6. Hamburg and Bratislava had the lowest and the second lowest average number of
reviews.
7. The average rating range between 3.8 and 4.2 for all cities across Europe.
8. There is a statistically significant difference between the cities with respect to the rating.
9. There is a statistically significant difference between the cities with respect to the number
of review.
10. There exist an association between rating and number of reviews.
25
1. At 14.2% and 12.3% respectively, London and Paris lead in the highest number of
destinations reviewed on TripAdvisor in Europe. This points to a bigger investment into
the restaurants business in both London and Paris as compared to other the investment in
other cities in Europe. This may also point to the two cities being the most visited in
Europe hence restaurants in the two locations are more likely to be reviewed.
2. At 0.4% and 0.5% respectively, Ljubljana and Luxembourg have the lowest number of
destinations reviewed on TripAdvisor in Europe. On the other end, this points to a lower
investment into the restaurants business in both Ljubljana and Luxembourg as compared
to other the investment in other cities in Europe. This may also point to the two cities
being the least visited in Europe hence restaurants in the two location are less likely to be
reviewed.
3. The average rating for restaurants across Europe on TripAdvisor is 3.989. This is a
significantly high rating implying that on average European restaurants are found to be
close to excellent.
4. The average number of reviews for restaurants across Europe is 125 reviews.
5. Rome and Edinburgh had the highest and second highest average number of reviews.
6. Hamburg and Bratislava had the lowest and the second lowest average number of
reviews.
7. The average rating range between 3.8 and 4.2 for all cities across Europe.
8. There is a statistically significant difference between the cities with respect to the rating.
9. There is a statistically significant difference between the cities with respect to the number
of review.
10. There exist an association between rating and number of reviews.
25
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Data Handling and Decision Making
11. Although there exist an association between rating and number of reviews, this
relationship is positive but not significant enough to be statistically meaningful.
12. City and Number of Reviews variables jointly have a significant association with the
Rating variable.
Task 7
Recommendations
The analysis in this study reveal that there is not significant association between the number of
reviews on a restaurant in Europe and the ratings that the restaurant gets on TripAdvisor. This
therefore imply that TripAdvisor can go ahead and reduce the number of reviews on their
websites without running the risk of affecting the ratings of the restaurants. This represents a
generalization, however this study is limited to European restaurants and not all destinations on
all locations on TripAdvisor. Therefore, TripAdvisor can decide to use European Restaurants as
a pilot phrase for the entire website while further data analysis is conducted on the other
destinations and locations. It can also use it as a test to see how customers react to reduction in
the number of reviews before more research is conducted to enable extension to other
destinations and locations.
Proposal
The analyses in this study are limited in terms of generalizability of the findings. This is because
the data is restricted to restaurants first and then to restaurants in Europe. If further would remain
in specifically reviews on restaurants as a pilot phrase, then there would be need to get data that
26
11. Although there exist an association between rating and number of reviews, this
relationship is positive but not significant enough to be statistically meaningful.
12. City and Number of Reviews variables jointly have a significant association with the
Rating variable.
Task 7
Recommendations
The analysis in this study reveal that there is not significant association between the number of
reviews on a restaurant in Europe and the ratings that the restaurant gets on TripAdvisor. This
therefore imply that TripAdvisor can go ahead and reduce the number of reviews on their
websites without running the risk of affecting the ratings of the restaurants. This represents a
generalization, however this study is limited to European restaurants and not all destinations on
all locations on TripAdvisor. Therefore, TripAdvisor can decide to use European Restaurants as
a pilot phrase for the entire website while further data analysis is conducted on the other
destinations and locations. It can also use it as a test to see how customers react to reduction in
the number of reviews before more research is conducted to enable extension to other
destinations and locations.
Proposal
The analyses in this study are limited in terms of generalizability of the findings. This is because
the data is restricted to restaurants first and then to restaurants in Europe. If further would remain
in specifically reviews on restaurants as a pilot phrase, then there would be need to get data that
26

Data Handling and Decision Making
is more representative of the global customers of TripAdvisor. A sufficiently large random
sample should be collected from the population of all TripAdvisor reviews on restaurants. Such a
sample would be sufficiently representative of the population with the resulting inferences able
to be generalized for the separate and joint relationship between ratings and, number of reviews
and city to inform on the decision to consider restaurant reviews that date back to a given date
and discard those that date back beyond that date.
However, if interest would be in having an all across approach as opposed to using restaurants as
the pilot phrase. In this case, a sufficiently large random sample of the reviews from the
TripAdvisor need to be collected and analyzed for the separate and joint relationship between
ratings and, number of reviews and city to inform on the decision to consider reviews that date
back to a given date and discard those that date back beyond that date. The analysis in this
instance will involve an extra variable, Destination Type, to provide more information on
whether there exist significant association between rating and number of reviews regardless of
the destination type.
An additional analysis need to be conducted if it has been asserted that the number of reviews do
not have any impact on ratings. This would answer the question how far back should the number
reviews be back dated for them to offer meaningful information. This would involve the division
of the data on ratings and number of reviews into periods of say one year. Then analysis of the
relationship should be evaluated between the reviews in each of the periods and the ratings in the
most recent period. This would inform on how far back the number of reviews remain relevant in
providing meaningful information.
27
is more representative of the global customers of TripAdvisor. A sufficiently large random
sample should be collected from the population of all TripAdvisor reviews on restaurants. Such a
sample would be sufficiently representative of the population with the resulting inferences able
to be generalized for the separate and joint relationship between ratings and, number of reviews
and city to inform on the decision to consider restaurant reviews that date back to a given date
and discard those that date back beyond that date.
However, if interest would be in having an all across approach as opposed to using restaurants as
the pilot phrase. In this case, a sufficiently large random sample of the reviews from the
TripAdvisor need to be collected and analyzed for the separate and joint relationship between
ratings and, number of reviews and city to inform on the decision to consider reviews that date
back to a given date and discard those that date back beyond that date. The analysis in this
instance will involve an extra variable, Destination Type, to provide more information on
whether there exist significant association between rating and number of reviews regardless of
the destination type.
An additional analysis need to be conducted if it has been asserted that the number of reviews do
not have any impact on ratings. This would answer the question how far back should the number
reviews be back dated for them to offer meaningful information. This would involve the division
of the data on ratings and number of reviews into periods of say one year. Then analysis of the
relationship should be evaluated between the reviews in each of the periods and the ratings in the
most recent period. This would inform on how far back the number of reviews remain relevant in
providing meaningful information.
27

Data Handling and Decision Making
References
Baack, S 2015, 'Datafication and Empowerment: How the Open Data Movement Re-articulates
Notions of Democracy, Participation and Journalism', Big Data and Society, vol.2, no.2, pp. 1-
11.
Chambers, JM 2017, Graphical Methods for Data Analysis, 1st edn, Chapman and Hall/CRC,
New York.
Conlin, J 2009, The Green Traveller, viewed 15 November 2019
<https://www.nytimes.com/2009/02/15/travel/15green.html?_r=0>
Damien, B 2017, Kaggle, viewed 16 November 2019,
<https://www.kaggle.com/damienbeneschi/krakow-ta-restaurans-data-raw>
De Jong-Chen, J 2015, 'Data Sovereignity, Cybersecurity, and Challenges for Globalization',
Georgetown Journal of International Affairs, vol.5, no.3, pp. 112-122.
Freedman, DA 2009, Statistical Models: Theory and Practice, 1st edn, Cambridge University
Press, London.
Galit, S, Peter, BC, Inbal, Y, Patel, NR & Kenneth, LC 2018, Data Mining for Business
Analytics, 1st edn, John Wiley & Sons, Inc., New Delhi.
Howitt, D & Cramer, D 2010, Introduction to Descriptive Statistics in Psycology, 5th edn,
Prentice Hall, New York.
Jaulin, L 2010, 'Probabilistic set-membership approach for robust regression.', Journal of
Statistical Theory and Practice, vol.5. no.1, pp. 1-14.
28
References
Baack, S 2015, 'Datafication and Empowerment: How the Open Data Movement Re-articulates
Notions of Democracy, Participation and Journalism', Big Data and Society, vol.2, no.2, pp. 1-
11.
Chambers, JM 2017, Graphical Methods for Data Analysis, 1st edn, Chapman and Hall/CRC,
New York.
Conlin, J 2009, The Green Traveller, viewed 15 November 2019
<https://www.nytimes.com/2009/02/15/travel/15green.html?_r=0>
Damien, B 2017, Kaggle, viewed 16 November 2019,
<https://www.kaggle.com/damienbeneschi/krakow-ta-restaurans-data-raw>
De Jong-Chen, J 2015, 'Data Sovereignity, Cybersecurity, and Challenges for Globalization',
Georgetown Journal of International Affairs, vol.5, no.3, pp. 112-122.
Freedman, DA 2009, Statistical Models: Theory and Practice, 1st edn, Cambridge University
Press, London.
Galit, S, Peter, BC, Inbal, Y, Patel, NR & Kenneth, LC 2018, Data Mining for Business
Analytics, 1st edn, John Wiley & Sons, Inc., New Delhi.
Howitt, D & Cramer, D 2010, Introduction to Descriptive Statistics in Psycology, 5th edn,
Prentice Hall, New York.
Jaulin, L 2010, 'Probabilistic set-membership approach for robust regression.', Journal of
Statistical Theory and Practice, vol.5. no.1, pp. 1-14.
28
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.

Data Handling and Decision Making
Karolin, K & Schrape, J 2018, 'Societal Implications of Big Data', Kunstliche Intelligenz, vol.32,
no.1, pp. 12.
Ki-Joon, B & Choong-Ki, L 2015, 'Determining the Attributes of Casino Customer Satisfaction:
Applying Impact-Range Performance and Asymmetry Analyses', Journal of Travel & Tourisim
Marketing, vol.32, no.6, pp. 747-760.
Kim, HY 2014, 'Analysis of variance (ANOVA) comparing means of more than two groups',
Restorative dentistry & endodontics, vol.39, no.1, pp. 74-77.
Kirk, A 2016, Data Visualization: A Handbook for Data Driven Design, 2nd edn, Sage
Publications, Ltd, Thousand Oaks, CA.
Korkontzelos, I & Nikfarjam, A 2016, 'Analysis of the Effect of Sentimental Analysis on
Extracting Adverse Drug Reactions from Tweets and Forum Posts', Journal of Biomedical
Informatics , vol.62, no.1, pp. 148-158.
Laudon, KC & Guercio, TC 2014, E-commerce. Business. Technology. Society, 10th edn,
Pearson, New York.
Lenca, M & Ferretti, A 2018, 'Considerations for Ethics Review of Big Data Health Research: A
Scoping Review', PLoS ONE, vol.13, no.10, pp. 23-25.
Malki, AA & Rizk, MA 2016, 'Hybrid Genetic Algorithm with K-Means for Clustering
Problems', Open Journal of Optimization, vol.5, no.2, pp. 1-4.
Naomi, BR & Heiberger, MR 2011, 'Plotting Likert and Other Rating Scales', JSM 2011,
vol.2011, no.1, pp. 1058-1066.
O'Neil, C & Schutt, R 2013, Doing Data Science, 3rd edn, O'Reily, London.
29
Karolin, K & Schrape, J 2018, 'Societal Implications of Big Data', Kunstliche Intelligenz, vol.32,
no.1, pp. 12.
Ki-Joon, B & Choong-Ki, L 2015, 'Determining the Attributes of Casino Customer Satisfaction:
Applying Impact-Range Performance and Asymmetry Analyses', Journal of Travel & Tourisim
Marketing, vol.32, no.6, pp. 747-760.
Kim, HY 2014, 'Analysis of variance (ANOVA) comparing means of more than two groups',
Restorative dentistry & endodontics, vol.39, no.1, pp. 74-77.
Kirk, A 2016, Data Visualization: A Handbook for Data Driven Design, 2nd edn, Sage
Publications, Ltd, Thousand Oaks, CA.
Korkontzelos, I & Nikfarjam, A 2016, 'Analysis of the Effect of Sentimental Analysis on
Extracting Adverse Drug Reactions from Tweets and Forum Posts', Journal of Biomedical
Informatics , vol.62, no.1, pp. 148-158.
Laudon, KC & Guercio, TC 2014, E-commerce. Business. Technology. Society, 10th edn,
Pearson, New York.
Lenca, M & Ferretti, A 2018, 'Considerations for Ethics Review of Big Data Health Research: A
Scoping Review', PLoS ONE, vol.13, no.10, pp. 23-25.
Malki, AA & Rizk, MA 2016, 'Hybrid Genetic Algorithm with K-Means for Clustering
Problems', Open Journal of Optimization, vol.5, no.2, pp. 1-4.
Naomi, BR & Heiberger, MR 2011, 'Plotting Likert and Other Rating Scales', JSM 2011,
vol.2011, no.1, pp. 1058-1066.
O'Neil, C & Schutt, R 2013, Doing Data Science, 3rd edn, O'Reily, London.
29

Data Handling and Decision Making
Oscar, M 2009, A data mining and knowledge discovery process model, 1st edn, Julio Ponce,
Vienna.
Pierre, D 2011, 'A New Perspective on Research Ethics', Health Law Review, vol.19, no.3, pp. 1-
5.
Ren, J & Ying, S 2010, Research and Improvement of Clustering Algorithms in Data Mining,
2010 2nd International Conference on Signal Processing Systems, Dalian, China.
Ruth, S & Brynhildur, D 2017, 'How to measure national energy sustainability performance: An
Icelandic case-study', Energy Sustainability for Development, vol.39, no.1, pp. 29-47.
Shaffer, CA 2011, Data Structures and Algorithms Analysis, Dover, Mineola.
Tri, D & Jugal, K 2015, Select Machine Learning Algorithms Using Regression Models, 1st edn,
2015 IEEE Conference, Boston MA.
TripAdvisor, 2019. TripAdvisor. viewed 17 November 2019,
<https://tripadvisor.mediaroom.com/us-about-us>
Vicenc, T 2017, Studies in Big Data, 1st edn, Springer International Publishing, Chicago.
Witten, IH 2011, Data Mining: Practical Machine Learning Tools, 3rd edn, Morgan Kaufmann,
Sydney.
Yu, YP, Omar, R, Harrison, RD, Sammathuria, MK & Nik, AR 2011, 'Pattern Clustering of
Forest Fires Based on Meteorological Variables and its Classification Using Hybrid Data Mining
Methods', Journal of Computational Biology and Bioinformatics Research, vol.3, no.1, pp. 47-
52.
30
Oscar, M 2009, A data mining and knowledge discovery process model, 1st edn, Julio Ponce,
Vienna.
Pierre, D 2011, 'A New Perspective on Research Ethics', Health Law Review, vol.19, no.3, pp. 1-
5.
Ren, J & Ying, S 2010, Research and Improvement of Clustering Algorithms in Data Mining,
2010 2nd International Conference on Signal Processing Systems, Dalian, China.
Ruth, S & Brynhildur, D 2017, 'How to measure national energy sustainability performance: An
Icelandic case-study', Energy Sustainability for Development, vol.39, no.1, pp. 29-47.
Shaffer, CA 2011, Data Structures and Algorithms Analysis, Dover, Mineola.
Tri, D & Jugal, K 2015, Select Machine Learning Algorithms Using Regression Models, 1st edn,
2015 IEEE Conference, Boston MA.
TripAdvisor, 2019. TripAdvisor. viewed 17 November 2019,
<https://tripadvisor.mediaroom.com/us-about-us>
Vicenc, T 2017, Studies in Big Data, 1st edn, Springer International Publishing, Chicago.
Witten, IH 2011, Data Mining: Practical Machine Learning Tools, 3rd edn, Morgan Kaufmann,
Sydney.
Yu, YP, Omar, R, Harrison, RD, Sammathuria, MK & Nik, AR 2011, 'Pattern Clustering of
Forest Fires Based on Meteorological Variables and its Classification Using Hybrid Data Mining
Methods', Journal of Computational Biology and Bioinformatics Research, vol.3, no.1, pp. 47-
52.
30
1 out of 30
Related Documents

Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
© 2024 | Zucol Services PVT LTD | All rights reserved.