Text Mining Approach Assignment
VerifiedAdded on 2021/01/02
|9
|6902
|137
AI Summary
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
Profiling Travellers' Mode Choice towards airport access (HKIA) – Introducing the Text Mining
Approach
Introduction (Background, Motivation, Problem Identification, Expected Outcome, Significance)
Situated in the Pearl River Delta, Hong Kong as a regional logistics hub, Asia’s top travel destination and an
international centre, has drawn more than 58 millions of visitors 2018, generating nearly 300 billions per
Capita Spending with an average stay of 3.2 nights (Hong Kong Tourism Board, 2018). Amongst all, 30
millions travelled to Hong Kong through air transport and landed at the Hong Kong International Airport
(Civil Aviation Department (CAD), 2018), while 4.6 millions of passengers has used HKIA’s cross-
boundary land and sea transport (Airport Authority Hong Kong, 2018). Opened on 6th July 1998, the Hong
Kong International Airport (HKIA) connects to over 200 destinations worldwide by more than a hundred
airlines. The Airport Authority aims to enhance its capacity as a leading aviation hub to cater for the
growing demand and serve as the key engine in facilitating economic growth With outstanding operational
performance, HKIA currently ranked 5th as the world’s top 10 airports by Skyrax (Airport Authority Hong
Kong, n.d.).
To accommodate the massive arrivals, the egress links to city-centre, Mainland China and Macau are well-
developed. Visitors are able to choose from a variety of transportation ranging from bus service, airport
express, to taxi and on-demand transportation such as Uber. Airport bridges people from around the world
with the aviation system and the other modes of transport in-city. Passengers’ mode choice is paramount to
evaluate the efficiency of the airport transport system, as well as providing valuable insights for policy
making and system planning. Yet, there are limited studies on visitors preference in their airport access
mode choice to HKIA. Further to previous research done regarding airport access mode choice on different
airport such as Turkey (Gokasar & Gunay, 2017), Korea (Choo, You, & Lee, 2013), HKIA (Tam, Tam, &
Lam, 2005), data are collected through conducting survey.
The process of data collection from surveys is time consuming, from setting up questionnaires to engaging
with respondents. Yet, it is difficult to acquire respondents over a longer period of time (over a week), to
determine seasonal trends. Furthermore, the sample size is small, leading to cautious interpretation of the
modelling results. The new era of data collection through the uprising technique of text mining extracts a
much larger pool than of surveys, across a longer period, with a shorter time. It also provides a different type
of insights compared to the traditional survey and modelling approach.
Text mining, as a knowledge discovery technique, acquires increasing importance in this digitalized era.
Information are readily available on medium such as forums, Facebook, Twitter, etc. The technique, as an
extension of extracting logical patterns from structured database, comprises of multiple fields to generate
decision analytics from large data set through information retrieval, text analysis, natural language
processing, and information classification (Irfan, et al., 2015). It covers disciplines in statistics, linguistics
and machine learning; generally includes categorization of information, clustering of text, eaccess (HKIA) –
Introducing the Text Mining Approach.
Purpose of text mining approach is to process or transform unstructured data or information ( textual) into
meaningful numeric indices from text given so that to frame information that is available in accessible text
that is available in various mining forms. In general terms text mining is responsible for turning text into
numbers or meaningful indices which can further be used in in other forms of analysis or examination such
as predictive data mining projects , unsupervised learning methods etc. There are various approaches of text
mining which have further been stated as under:
Using well tested methods: In this process or approach, once a data matrix has been derived from input
documents it is important that well developed and well known analytical tools and techniques are used for
further processing their data . This method can further infuse methods such as clustering, factoring or
predictive data mining.
Black box approach: There are a number of text mining applications that involve black box method so that
a deep meaning from documents can be extracted with involving a certain amount of human effort. In this
Approach
Introduction (Background, Motivation, Problem Identification, Expected Outcome, Significance)
Situated in the Pearl River Delta, Hong Kong as a regional logistics hub, Asia’s top travel destination and an
international centre, has drawn more than 58 millions of visitors 2018, generating nearly 300 billions per
Capita Spending with an average stay of 3.2 nights (Hong Kong Tourism Board, 2018). Amongst all, 30
millions travelled to Hong Kong through air transport and landed at the Hong Kong International Airport
(Civil Aviation Department (CAD), 2018), while 4.6 millions of passengers has used HKIA’s cross-
boundary land and sea transport (Airport Authority Hong Kong, 2018). Opened on 6th July 1998, the Hong
Kong International Airport (HKIA) connects to over 200 destinations worldwide by more than a hundred
airlines. The Airport Authority aims to enhance its capacity as a leading aviation hub to cater for the
growing demand and serve as the key engine in facilitating economic growth With outstanding operational
performance, HKIA currently ranked 5th as the world’s top 10 airports by Skyrax (Airport Authority Hong
Kong, n.d.).
To accommodate the massive arrivals, the egress links to city-centre, Mainland China and Macau are well-
developed. Visitors are able to choose from a variety of transportation ranging from bus service, airport
express, to taxi and on-demand transportation such as Uber. Airport bridges people from around the world
with the aviation system and the other modes of transport in-city. Passengers’ mode choice is paramount to
evaluate the efficiency of the airport transport system, as well as providing valuable insights for policy
making and system planning. Yet, there are limited studies on visitors preference in their airport access
mode choice to HKIA. Further to previous research done regarding airport access mode choice on different
airport such as Turkey (Gokasar & Gunay, 2017), Korea (Choo, You, & Lee, 2013), HKIA (Tam, Tam, &
Lam, 2005), data are collected through conducting survey.
The process of data collection from surveys is time consuming, from setting up questionnaires to engaging
with respondents. Yet, it is difficult to acquire respondents over a longer period of time (over a week), to
determine seasonal trends. Furthermore, the sample size is small, leading to cautious interpretation of the
modelling results. The new era of data collection through the uprising technique of text mining extracts a
much larger pool than of surveys, across a longer period, with a shorter time. It also provides a different type
of insights compared to the traditional survey and modelling approach.
Text mining, as a knowledge discovery technique, acquires increasing importance in this digitalized era.
Information are readily available on medium such as forums, Facebook, Twitter, etc. The technique, as an
extension of extracting logical patterns from structured database, comprises of multiple fields to generate
decision analytics from large data set through information retrieval, text analysis, natural language
processing, and information classification (Irfan, et al., 2015). It covers disciplines in statistics, linguistics
and machine learning; generally includes categorization of information, clustering of text, eaccess (HKIA) –
Introducing the Text Mining Approach.
Purpose of text mining approach is to process or transform unstructured data or information ( textual) into
meaningful numeric indices from text given so that to frame information that is available in accessible text
that is available in various mining forms. In general terms text mining is responsible for turning text into
numbers or meaningful indices which can further be used in in other forms of analysis or examination such
as predictive data mining projects , unsupervised learning methods etc. There are various approaches of text
mining which have further been stated as under:
Using well tested methods: In this process or approach, once a data matrix has been derived from input
documents it is important that well developed and well known analytical tools and techniques are used for
further processing their data . This method can further infuse methods such as clustering, factoring or
predictive data mining.
Black box approach: There are a number of text mining applications that involve black box method so that
a deep meaning from documents can be extracted with involving a certain amount of human effort. In this
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
method text minnig mainly depends on proprietary algorithms that can be used for gaining concepts from
text. This technology is expected to yet in its infant stage in current scenario.
Text mining as document search: This is another important application that is often known as text mining.
This approach occurs in a domain form. For example, popular internet search engines that are used by
individuals for providing efficient access to web pages that have important content. It is a quite important
type of application software which is very beneficial for business entities that have to search data in quite
larger directory form. With the help of this, maximum benefits could be gained by business entities in
pulling out right amount of information in specific time frame.
Have done the intro and literature review; most need help on methodology and result part, ofc the text
mining part too. Dont mind any changes on the scope/objective of the project, but mostly will be using data
from TripAdvisor. Traction of concepts, and formulation of general taxonomies. Text mining help extract
useful information from bulk data efficiently in a short period of time, as well as assisting the prediction of
future aspect based on the provided observations and statistics generated from the concluded trends from
data sets (Hashimi, Hafez, & Mathkour, 2015). Social media mining has been employed by many businesses
to perform competitive analysis through transforming data into insights. In contrast with traditional data
analytics, social media tools show the interactivity between users, which has become a crucial role in
changing people’s communication. Traditional media engages people in a one way connection. Referrals
and promotions from the social word-of-mouth also cultivate the understanding of their customer base,
which brings about business value for companies’ to develop their marketing and business strategies (Shen,
Chen, & Wang, 2018).
This project puts together the text mining techniques with social media to unveil travellers’ preference in
their mode choice of airport access. The motivations are twofold: first, to apply data since methodology to
collect and analyse social media data; second, to present past and current trends of transportation preferences
and their implications, hence, provide interesting insights.
Significance
The objectives of this study are as follow:
1. To analyse the concept of text-mining as a new approach to look into mode choices and
transportation
2. To identify the explanatory variables for mode choice
3. To find out travellers’ experience with the transportation system to and from Hong Kong
International Airport
4. To analyse the preferred mode choice
5. To determine the change of preferences over time and seasonal preference
Expected Outcome
Travellers’ preference and experience with the access mode of Hong Kong International Airport are
expected to be found through parsing and analysing online data. Insights and trends are expected to bring
recommendations for enhancing the current system, policy and planning of airport access mode. Most
importantly, give an outline of the approach of text-mining for finding airport access mode choice and set
grounds for a wider scope of study in the future.
Literature Review
Airport access mode choice
To facilitate the advancement of airport management, gaining understandings of air passengers concerning
airport access modes is of crucial importance. Alhussein (2011) has done the very first research on ground
access modes choice to King Khaled International Airport (KKIA) in Riyadh, Saudi Arabia, aiming to
analyse access mode behaviour to KKIA. Tam, Tam and Lam (2005) examine the access mode choices of
departing passengers are studied to provide source information for transport operators to improve their
services planning and increase their shares at the airport ground access market. Choo, You and Lee (2013)
explored passengers’ airport access mode choice and hence developed mode choice models after conducting
text. This technology is expected to yet in its infant stage in current scenario.
Text mining as document search: This is another important application that is often known as text mining.
This approach occurs in a domain form. For example, popular internet search engines that are used by
individuals for providing efficient access to web pages that have important content. It is a quite important
type of application software which is very beneficial for business entities that have to search data in quite
larger directory form. With the help of this, maximum benefits could be gained by business entities in
pulling out right amount of information in specific time frame.
Have done the intro and literature review; most need help on methodology and result part, ofc the text
mining part too. Dont mind any changes on the scope/objective of the project, but mostly will be using data
from TripAdvisor. Traction of concepts, and formulation of general taxonomies. Text mining help extract
useful information from bulk data efficiently in a short period of time, as well as assisting the prediction of
future aspect based on the provided observations and statistics generated from the concluded trends from
data sets (Hashimi, Hafez, & Mathkour, 2015). Social media mining has been employed by many businesses
to perform competitive analysis through transforming data into insights. In contrast with traditional data
analytics, social media tools show the interactivity between users, which has become a crucial role in
changing people’s communication. Traditional media engages people in a one way connection. Referrals
and promotions from the social word-of-mouth also cultivate the understanding of their customer base,
which brings about business value for companies’ to develop their marketing and business strategies (Shen,
Chen, & Wang, 2018).
This project puts together the text mining techniques with social media to unveil travellers’ preference in
their mode choice of airport access. The motivations are twofold: first, to apply data since methodology to
collect and analyse social media data; second, to present past and current trends of transportation preferences
and their implications, hence, provide interesting insights.
Significance
The objectives of this study are as follow:
1. To analyse the concept of text-mining as a new approach to look into mode choices and
transportation
2. To identify the explanatory variables for mode choice
3. To find out travellers’ experience with the transportation system to and from Hong Kong
International Airport
4. To analyse the preferred mode choice
5. To determine the change of preferences over time and seasonal preference
Expected Outcome
Travellers’ preference and experience with the access mode of Hong Kong International Airport are
expected to be found through parsing and analysing online data. Insights and trends are expected to bring
recommendations for enhancing the current system, policy and planning of airport access mode. Most
importantly, give an outline of the approach of text-mining for finding airport access mode choice and set
grounds for a wider scope of study in the future.
Literature Review
Airport access mode choice
To facilitate the advancement of airport management, gaining understandings of air passengers concerning
airport access modes is of crucial importance. Alhussein (2011) has done the very first research on ground
access modes choice to King Khaled International Airport (KKIA) in Riyadh, Saudi Arabia, aiming to
analyse access mode behaviour to KKIA. Tam, Tam and Lam (2005) examine the access mode choices of
departing passengers are studied to provide source information for transport operators to improve their
services planning and increase their shares at the airport ground access market. Choo, You and Lee (2013)
explored passengers’ airport access mode choice and hence developed mode choice models after conducting
Chi-square and ANOVA tests to identify key explanatory variables of the airports. All of these researches
done have one thing in common: data are collected through conducting survey or face-to-face interview at
the terminals targeting departing passengers at random.
In the research done by Tam, Tam and Lam (2005), not only did the structural relations between passengers’
personal characteristics, trip characteristics were included, but also Expectation and Perception, the two
latent variables previous researches have not taken into account. Personal and trip characteristics including
gender, age, education level, flight length and travel cost all negatively impact the use of public transport
modes for airport ground access, also suggested by Alhussein (2011). Public transport dominate the top
preference of mode choice in Hong Kong, opposite to western countries. Visitors on business trip or visit the
HKIA less frequently have a tendency to select private cars/taxi as their ground access mode choice. Results
has indicated that respondents’ perceived levels of satisfaction are lower than their expectations on the five
selected service attributes (franchised buss, AEL, private car, taxi, others). Passengers found travel time
reliability as the most satisfactory service attribute; while waiting time of franchised buses, walking distance
to and from the Airport Express stations, travel cost for taxi and private car, as well as waiting time for
airport shuttle buses offered by hotels and travel agencies all have a high priority for improvement.
Alhussein (2011), Tam, Tam and Lam (2005) suggested that future studies could collect data to determine
the effects of travel seasons on airport ground access mode choice, with an inclusion of more service
attributes, and the latent variables.
Text mining and social media
Social media such as online forums have gain increasing popularity in exchanging ideas and advice.
Discovering from the online communities could be rewarding. Park, Conway and Chen (2017) employed the
text mining ,qualitative analysis and visualization approach to compare online discussion content from three
online mental health communities. Corpus was downloaded using Python Reddit API Wrapper (PRAW).
Python Natural Language Toolkit and Scikit-learn was then used to pre-process the dataset – removing stop
words, punctuations, both high- and low-frequency terms, and tokenization. K-mean clustering followed
after to identify main discussion themes in a large collection of documents. The frequency of term
appearance was then visualized as a bubble chart, proportional to the cluster size, by D3 and a network
visualization by Gephi. Venn diagram was used to visualize the thematic overlaps among the three online
communities. Qualitative comparison was carried out as a result. Louvain modularity algorithm (in Gephi)
and heatmap visualization of Jaccard similarity scores were used as an illustration of how clusters are
topically similar and dissimilar from one another. The research findings facilitates more nuanced discussions
and encourage future researches to include multiple methods in fully understanding of differences among
conditions with shared symptomatology. Yet, the approach serves as a valuable take away for analysing and
visualizing textual comparisons. Social media is a modern day approach by which companies can enhance
their popularity among maximum number people at very high speed. It can be said that this approach is most
effective one as it contributes in communicating high number of people. It has been analysed that now a
days people belonging to every generation is having their account on social media. That provides ensures the
company that if they shares any information on this platform it will be transferred to everyone that means
from youth-old age people. As a result, this shared information will provide them business to company and
contribute in attaining desired targets. It will also maximise profitability of the company in effective manner
within less consumption of time. The site is also helpful in taking suggestions from customers as the users
can share their personal experience. On the basis of their experience, they also provide advices on the
official account of company as it will help them out in improving weaknesses. By, improving these issues,
company can work on the mentioned areas and enhance its quality that is being provided to customers in
effective manner. In context to text mining, it can be said that raw data can be used at this place in order to
convert it into meaningful data. Thus, it can be said that social media can be used here for sharing this
collected information to maximum people.
Methodology
Text-mining
done have one thing in common: data are collected through conducting survey or face-to-face interview at
the terminals targeting departing passengers at random.
In the research done by Tam, Tam and Lam (2005), not only did the structural relations between passengers’
personal characteristics, trip characteristics were included, but also Expectation and Perception, the two
latent variables previous researches have not taken into account. Personal and trip characteristics including
gender, age, education level, flight length and travel cost all negatively impact the use of public transport
modes for airport ground access, also suggested by Alhussein (2011). Public transport dominate the top
preference of mode choice in Hong Kong, opposite to western countries. Visitors on business trip or visit the
HKIA less frequently have a tendency to select private cars/taxi as their ground access mode choice. Results
has indicated that respondents’ perceived levels of satisfaction are lower than their expectations on the five
selected service attributes (franchised buss, AEL, private car, taxi, others). Passengers found travel time
reliability as the most satisfactory service attribute; while waiting time of franchised buses, walking distance
to and from the Airport Express stations, travel cost for taxi and private car, as well as waiting time for
airport shuttle buses offered by hotels and travel agencies all have a high priority for improvement.
Alhussein (2011), Tam, Tam and Lam (2005) suggested that future studies could collect data to determine
the effects of travel seasons on airport ground access mode choice, with an inclusion of more service
attributes, and the latent variables.
Text mining and social media
Social media such as online forums have gain increasing popularity in exchanging ideas and advice.
Discovering from the online communities could be rewarding. Park, Conway and Chen (2017) employed the
text mining ,qualitative analysis and visualization approach to compare online discussion content from three
online mental health communities. Corpus was downloaded using Python Reddit API Wrapper (PRAW).
Python Natural Language Toolkit and Scikit-learn was then used to pre-process the dataset – removing stop
words, punctuations, both high- and low-frequency terms, and tokenization. K-mean clustering followed
after to identify main discussion themes in a large collection of documents. The frequency of term
appearance was then visualized as a bubble chart, proportional to the cluster size, by D3 and a network
visualization by Gephi. Venn diagram was used to visualize the thematic overlaps among the three online
communities. Qualitative comparison was carried out as a result. Louvain modularity algorithm (in Gephi)
and heatmap visualization of Jaccard similarity scores were used as an illustration of how clusters are
topically similar and dissimilar from one another. The research findings facilitates more nuanced discussions
and encourage future researches to include multiple methods in fully understanding of differences among
conditions with shared symptomatology. Yet, the approach serves as a valuable take away for analysing and
visualizing textual comparisons. Social media is a modern day approach by which companies can enhance
their popularity among maximum number people at very high speed. It can be said that this approach is most
effective one as it contributes in communicating high number of people. It has been analysed that now a
days people belonging to every generation is having their account on social media. That provides ensures the
company that if they shares any information on this platform it will be transferred to everyone that means
from youth-old age people. As a result, this shared information will provide them business to company and
contribute in attaining desired targets. It will also maximise profitability of the company in effective manner
within less consumption of time. The site is also helpful in taking suggestions from customers as the users
can share their personal experience. On the basis of their experience, they also provide advices on the
official account of company as it will help them out in improving weaknesses. By, improving these issues,
company can work on the mentioned areas and enhance its quality that is being provided to customers in
effective manner. In context to text mining, it can be said that raw data can be used at this place in order to
convert it into meaningful data. Thus, it can be said that social media can be used here for sharing this
collected information to maximum people.
Methodology
Text-mining
This approach is considered as most common approach that involves a representation of text which is
basedon keywords for searching a data. This keyword approach further can combine with some statistical
methods like pattern recognition technique, machine learning and more. This would help in discovering the
relationships among different elements in text. Along with this, text-mining approach also based on latest
technologies like Aritifical Intelligence, semantic abd more. It helps system in leveraging an understanding
of various languages more deeply, for understanding a text. It enables in extracting the desired information
and hidden knowledge under text comment more appropriately. In addition to this, it also improves overall
analysis as well as manage information more appropriately. For this purpose, text mining approach use
various applications like Knowledge management software, Customer Intelligence software, Entity
extraction software and more. Here, in present dissertation, researchers have used Reddit and Tripadvisior
software for extracting information about “ Hong Kong Airport Transportation”.
Reddit is a popular American social networking and ideas exchanging platform. Here, only
registered members can submit conents to the site like links, images and text posts, which are further voted
up. Under this site, posts are organised according to subject into user-created boards, that features
“subreddit” on a broad range of topics. It includes news, books, fitness, food, vedio games, science and
image-sharing. As this site refers to a combination of enteries that are submitted through registered users,
especially a bulletin board system. Therefore, more than 330 million users are active with nearly 150K
active communities and 1.2 billion comments (Reddit, 2018).
While Tripadvisor is an American online community which is dedicated to travel and tourism. This site
shows travel relevant content such as reviews of hotels and restaurants, accommodations, bookings and
more. In addition to this, Tripadvisor Inc. also includes features like travel forums, where travellers used to
share their travel experience. This kind of forums are also known as non-commercial, independent as as well
a friendly community to expert travellers. Here, they can share their experiences with each plan and help in
planning a perfect trip also. Today, more than 950k users are connected up on this site. Therefore,
Tripadvisor is considered as the largest travel website in entire world. It is appeared among the top results
that provide over 315 million reviews of travellers and more than 500 million reviews of travel related
business such as attractive sites, hotels and restaurants, that enhance travel experience. So, researchers of
present dissertation also use Tripadvisior in searching information related to “Hong Kong Airport
Transportation” on Google.
Data selection
In order to conduct a research in desired manner, project-makers can use various methodologies to collect
data. It includes primary and secondary resources, which provide various sources for gathering relevant
information on a chosen topic. Here, under primary research, researchers used to conduct investigation in
own manner, by selecting a sample of respondents from total population. Further for obtaining feedback,
they use primary methods like questionnaire, online/offline survey, interviews and more. But since present
dissertation is based on analysing choice of travellers and travel related data on Hong Kong Airport.
Therefore, conducting primary research seems to be much difficult for taking feedback of such respondents.
It also consumes much time and cost ineffective also. In this regard, to overcome from such drawbacks, it is
better to take secondary sources. It includes sources like books, journals, articles and more, through which
desired information can be obtained. In context with present research, here text-mining is used for gathering
data related to travel experience. It provides various online application software from where reviews and
feedback of travellers can be obtained. For example- Tripadvisor and Reddit sites that provides options to
search travel related data. It also consists public forums where travellers can share their travelling
information and reviews about hotel and restaurants.
Using the secondary research method will also help researcher in reaching to or pulling out favourable
outcomes in specific time frame. Apart from this, it is being found that secondary method always stays
fruitful for a researcher but in a range of cases problems like irrelevant information could be gathered that
can impact upon the whole investigation. Through this, maximum benefits can be gained by researcher
while conducting the investigation based on set topic. Along with this, using a secondary information
The publicly available content from each forum discussion will be downloaded using Python. This tool is
used under web-scrapping for extracting and processing large amount of data from internet. Therefore, for
basedon keywords for searching a data. This keyword approach further can combine with some statistical
methods like pattern recognition technique, machine learning and more. This would help in discovering the
relationships among different elements in text. Along with this, text-mining approach also based on latest
technologies like Aritifical Intelligence, semantic abd more. It helps system in leveraging an understanding
of various languages more deeply, for understanding a text. It enables in extracting the desired information
and hidden knowledge under text comment more appropriately. In addition to this, it also improves overall
analysis as well as manage information more appropriately. For this purpose, text mining approach use
various applications like Knowledge management software, Customer Intelligence software, Entity
extraction software and more. Here, in present dissertation, researchers have used Reddit and Tripadvisior
software for extracting information about “ Hong Kong Airport Transportation”.
Reddit is a popular American social networking and ideas exchanging platform. Here, only
registered members can submit conents to the site like links, images and text posts, which are further voted
up. Under this site, posts are organised according to subject into user-created boards, that features
“subreddit” on a broad range of topics. It includes news, books, fitness, food, vedio games, science and
image-sharing. As this site refers to a combination of enteries that are submitted through registered users,
especially a bulletin board system. Therefore, more than 330 million users are active with nearly 150K
active communities and 1.2 billion comments (Reddit, 2018).
While Tripadvisor is an American online community which is dedicated to travel and tourism. This site
shows travel relevant content such as reviews of hotels and restaurants, accommodations, bookings and
more. In addition to this, Tripadvisor Inc. also includes features like travel forums, where travellers used to
share their travel experience. This kind of forums are also known as non-commercial, independent as as well
a friendly community to expert travellers. Here, they can share their experiences with each plan and help in
planning a perfect trip also. Today, more than 950k users are connected up on this site. Therefore,
Tripadvisor is considered as the largest travel website in entire world. It is appeared among the top results
that provide over 315 million reviews of travellers and more than 500 million reviews of travel related
business such as attractive sites, hotels and restaurants, that enhance travel experience. So, researchers of
present dissertation also use Tripadvisior in searching information related to “Hong Kong Airport
Transportation” on Google.
Data selection
In order to conduct a research in desired manner, project-makers can use various methodologies to collect
data. It includes primary and secondary resources, which provide various sources for gathering relevant
information on a chosen topic. Here, under primary research, researchers used to conduct investigation in
own manner, by selecting a sample of respondents from total population. Further for obtaining feedback,
they use primary methods like questionnaire, online/offline survey, interviews and more. But since present
dissertation is based on analysing choice of travellers and travel related data on Hong Kong Airport.
Therefore, conducting primary research seems to be much difficult for taking feedback of such respondents.
It also consumes much time and cost ineffective also. In this regard, to overcome from such drawbacks, it is
better to take secondary sources. It includes sources like books, journals, articles and more, through which
desired information can be obtained. In context with present research, here text-mining is used for gathering
data related to travel experience. It provides various online application software from where reviews and
feedback of travellers can be obtained. For example- Tripadvisor and Reddit sites that provides options to
search travel related data. It also consists public forums where travellers can share their travelling
information and reviews about hotel and restaurants.
Using the secondary research method will also help researcher in reaching to or pulling out favourable
outcomes in specific time frame. Apart from this, it is being found that secondary method always stays
fruitful for a researcher but in a range of cases problems like irrelevant information could be gathered that
can impact upon the whole investigation. Through this, maximum benefits can be gained by researcher
while conducting the investigation based on set topic. Along with this, using a secondary information
The publicly available content from each forum discussion will be downloaded using Python. This tool is
used under web-scrapping for extracting and processing large amount of data from internet. Therefore, for
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
this process, to analyse large amount of dataset from public forum, researchers are required to possess ability
to scrape data. Here, Python tool consists three techniques- Phthon's Beautiful Soup module for data
extraction from web; Python's Pandas Library for data manipulation; Python's Matplotib Library for data
visualization. But to crawl data from Reddit - RAW package is used while Scrapy for scraping from
Tripadvisor. Crawls on each site focuses on the discussion content, time stamp, and the user’s origin. These
two forums are selected to acquire information. Searches on “HK Airport” will be examined in both forums,
under subreddit r/HongKongTravel and r/HongKong for Reddit. Discussions from 2008 to 2018 related to
“HK Airport” are downloaded. Only English discussion is processed.
Data pre-processing
Data from public forum is extracted through web-scrapper tool, which covers various unwanted and
unreliable information also. As here Python tool is used for web-scrapping, manipulating and cleaning data.
Therefore, before analysing the text downloaded, duplicated topics will be deleted. Replies crawled that
separated into two entries, will be merged back into one. Both processes are done through Excel VBA.
Discussion topics and replies will be separated to avoid misleading information. Next, stop words will be
removed employing Python Natural Language Toolkit (NLTK) (Version 3.4) (NLTK Project, n.d.). In
addition to traditional English stop words, shorthand notations and slangs are also filtered, with reference to
Netlingo and SMS dictionary. Lemmatization will then be performed to correct stemmed words, such as
“hapi” to “happy” (Shen, Chen, & Wang, 2018). The dataset is pre-processed to remove stop words,
undergo tokenization and lemmatization.
Term frequency-inverse document frequency (TF-IDF) analysis
TF-IDF weighs a term’s frequency inverse to the document frequency, where each term has a TF and IDF
score respectively. The term frequency weight is capable of converting the occurrence frequency of a word
into a weighted value. For term t in a document d, the tfidf is given by: “The tf-idf scores” that distinguish
the keywords from the documents, hence, helps find out the most and least popular airport access mode.
This method is known as numerical statistics which is used for measuring quality of a data. As here in
present research, data is obtained through text mining approach and web-scrapping method. Therefore, it is
essential to meaure quality of each data so that better outcomes can be obtained. In this regard, search engine
use variation of tf-idf weighting scheme as a central tool, for scoring and ranking relevance of document.
Here, this scheme also used as stop-words filterming in numerous fields that includes text summarisation
and classification.
Concept Linking
To discover how two words/concepts are associated, concept linking looks for the connection between
related data based on their co-occurrence and closeness within the document, indicating the relative strength
of the associated terms (Shen, Chen, & Wang, 2018). The strength of association can indicates the link of
each transport mode to different things. For example, MTR and positive experience, overnight bus and
overnight arrivals and etc. It establishes grounds for visualizing the association between travellers’
experience with their mode choice. According to Shen, Chen and Wang (2018), the most relevant words and
terms can be explored through their assumptions as follow:
Where
k is denoted as the number of documents that contains both term A and B, while n denotes the number of
documents containing B; p is the probability that term A occurs when term B occurs (assuming both term
are independent of each other).
Sentiment analysis
Standford NLP is used to analyse the corpus. The underlying technology of this package is based on a new
type of Recursive Neural Network that builds on top of grammatical structures. Sentiment of each word is
analysed and given a score. The sentiment scores are calculated and classified into 5 sentiment classes: very
negative, negative, neutral, positive and very positive, where the total sentiment scores for the text will be
to scrape data. Here, Python tool consists three techniques- Phthon's Beautiful Soup module for data
extraction from web; Python's Pandas Library for data manipulation; Python's Matplotib Library for data
visualization. But to crawl data from Reddit - RAW package is used while Scrapy for scraping from
Tripadvisor. Crawls on each site focuses on the discussion content, time stamp, and the user’s origin. These
two forums are selected to acquire information. Searches on “HK Airport” will be examined in both forums,
under subreddit r/HongKongTravel and r/HongKong for Reddit. Discussions from 2008 to 2018 related to
“HK Airport” are downloaded. Only English discussion is processed.
Data pre-processing
Data from public forum is extracted through web-scrapper tool, which covers various unwanted and
unreliable information also. As here Python tool is used for web-scrapping, manipulating and cleaning data.
Therefore, before analysing the text downloaded, duplicated topics will be deleted. Replies crawled that
separated into two entries, will be merged back into one. Both processes are done through Excel VBA.
Discussion topics and replies will be separated to avoid misleading information. Next, stop words will be
removed employing Python Natural Language Toolkit (NLTK) (Version 3.4) (NLTK Project, n.d.). In
addition to traditional English stop words, shorthand notations and slangs are also filtered, with reference to
Netlingo and SMS dictionary. Lemmatization will then be performed to correct stemmed words, such as
“hapi” to “happy” (Shen, Chen, & Wang, 2018). The dataset is pre-processed to remove stop words,
undergo tokenization and lemmatization.
Term frequency-inverse document frequency (TF-IDF) analysis
TF-IDF weighs a term’s frequency inverse to the document frequency, where each term has a TF and IDF
score respectively. The term frequency weight is capable of converting the occurrence frequency of a word
into a weighted value. For term t in a document d, the tfidf is given by: “The tf-idf scores” that distinguish
the keywords from the documents, hence, helps find out the most and least popular airport access mode.
This method is known as numerical statistics which is used for measuring quality of a data. As here in
present research, data is obtained through text mining approach and web-scrapping method. Therefore, it is
essential to meaure quality of each data so that better outcomes can be obtained. In this regard, search engine
use variation of tf-idf weighting scheme as a central tool, for scoring and ranking relevance of document.
Here, this scheme also used as stop-words filterming in numerous fields that includes text summarisation
and classification.
Concept Linking
To discover how two words/concepts are associated, concept linking looks for the connection between
related data based on their co-occurrence and closeness within the document, indicating the relative strength
of the associated terms (Shen, Chen, & Wang, 2018). The strength of association can indicates the link of
each transport mode to different things. For example, MTR and positive experience, overnight bus and
overnight arrivals and etc. It establishes grounds for visualizing the association between travellers’
experience with their mode choice. According to Shen, Chen and Wang (2018), the most relevant words and
terms can be explored through their assumptions as follow:
Where
k is denoted as the number of documents that contains both term A and B, while n denotes the number of
documents containing B; p is the probability that term A occurs when term B occurs (assuming both term
are independent of each other).
Sentiment analysis
Standford NLP is used to analyse the corpus. The underlying technology of this package is based on a new
type of Recursive Neural Network that builds on top of grammatical structures. Sentiment of each word is
analysed and given a score. The sentiment scores are calculated and classified into 5 sentiment classes: very
negative, negative, neutral, positive and very positive, where the total sentiment scores for the text will be
evaluated. This analysis is able to give the overall impression and sentiment related to the transportation
system linking to the airport.
Visualization
Word cloud will be used to demonstrate the term-frequency analysis, where cloud size is proportional to the
frequency of terms, for recent years and the past 10 years; concept links network will be used to portray the
terms most related to each transport mode for recent years. Line chart will be used to demonstrate the
changes of sentiment scores of the transport system over time.
Preliminary Findings
For the initial stage of the project, data are crawled from two selected forum –Tripadvisor and Reddit.
As per literature reviews, people’s preference of mode choice was public transport in the past. This findings
has established grounds for the comparison of the data pulled from the web.
200 of 1003 pages were scraped from Tripadvisor while searching for “HK Airport transport”, with 1396
replies exported to 3615 entries in the form of .csv. Since some of the replies are split into a few different
entries, running a VBA program to combine the separated entries back into a reply is necessary. The
majority of the forum posts/ replies are from Singapore, China, Australia, USA, UK, Malaysia, South Africa
and India (where the sequence are not in accordance of their ranking). Countries such as Canada, Israel,
Sweden, Spain, Macau and Indonesia showed up, yet with a fewer frequency. Users posted enquiries
regarding taxi, airport-hotel shuttle, octopus card and the transport modes connecting to the mainland China
and Macau. There are also unrelated posts scraped, namely “Pre-arrival registration for Indian Nationals”
and “HK visa for Vietnamese passport”. The scraped data spanned across 2007 and 2018, with a
concentration of posts between 2015 to 2017. It can be rationalised by the increasing popularity of social
media and the rise of Tripadvisor. This kind of web applications are also having a range of negative points
as well, as there were a range posts that has been found out given by visitors of HKIA that, staff of this
airport discriminated against a few communities like Indians, and so on.
With the help of this, it can easily be said that web applications are just not beneficial for an individual but it
can also create problems as well for a community and can also become a part of social affairs as well. Along
with this, it has also been analysed that, Reddit is also having same issues as well which could basically lead
people to loose interest towards a place which they were willing to visit. Thus, it is essential for both of the
web applications to look into these cases, prominently keeps an eye on the content which has been
developed for an individual as well. With the help of this, maximum benefits may also be given to
individuals as well. Apart from this, Singapore, China, Australia, USA, UK, Malaysia, South Africa and
India are some of the countries whose citizens mainly stays online on different social media. This is the
basic reason behind Tripadvisor and Reddit were specifically started getting feeds just after when the
organisations started their business at international level. Along with this, HK visa for Vietnamese passport
has also been improved as well.
Thus, it is essential for international airports of countries like Hong Kong, Bangkok to develop a range of
policies right on time in order to offer individuals with a good amount of benefits. Since, most of replies
from 1396 are specifically in favour of the HKIA and it is also helping them in making modifications as per
the requirements. But, if it has kept focus on a range of issues like ethical hackers are one of crucial barrier
for companies that are doing business on internet. This would be considered as one of the important subject
that needs to be searched in depth so that maximum benefits could be gained by them right on time. On the
other hand, if calculated than, it can easily be said that Reddit is having a good amount clients or subscribers
than Tripadvisor. This means that company would require to make alterations as per the needs of clients so
that maximum benefits could be gained by them. Along with this, One Solution for Everything is one of the
approach that is being followed by both of the web based organisations (Tripadvisor and Reddit) which
specifically helps in offering online website visitors to make payments, look at the hotels, look at the
reviews given by people that already visited the places. This approach is basically being followed by most of
companies that are looking forward to expand their business or to do other things as well. This approach
held by both of the companies (Tripadvisor and Reddit) in reaching to new heights. HKIA which is one of
system linking to the airport.
Visualization
Word cloud will be used to demonstrate the term-frequency analysis, where cloud size is proportional to the
frequency of terms, for recent years and the past 10 years; concept links network will be used to portray the
terms most related to each transport mode for recent years. Line chart will be used to demonstrate the
changes of sentiment scores of the transport system over time.
Preliminary Findings
For the initial stage of the project, data are crawled from two selected forum –Tripadvisor and Reddit.
As per literature reviews, people’s preference of mode choice was public transport in the past. This findings
has established grounds for the comparison of the data pulled from the web.
200 of 1003 pages were scraped from Tripadvisor while searching for “HK Airport transport”, with 1396
replies exported to 3615 entries in the form of .csv. Since some of the replies are split into a few different
entries, running a VBA program to combine the separated entries back into a reply is necessary. The
majority of the forum posts/ replies are from Singapore, China, Australia, USA, UK, Malaysia, South Africa
and India (where the sequence are not in accordance of their ranking). Countries such as Canada, Israel,
Sweden, Spain, Macau and Indonesia showed up, yet with a fewer frequency. Users posted enquiries
regarding taxi, airport-hotel shuttle, octopus card and the transport modes connecting to the mainland China
and Macau. There are also unrelated posts scraped, namely “Pre-arrival registration for Indian Nationals”
and “HK visa for Vietnamese passport”. The scraped data spanned across 2007 and 2018, with a
concentration of posts between 2015 to 2017. It can be rationalised by the increasing popularity of social
media and the rise of Tripadvisor. This kind of web applications are also having a range of negative points
as well, as there were a range posts that has been found out given by visitors of HKIA that, staff of this
airport discriminated against a few communities like Indians, and so on.
With the help of this, it can easily be said that web applications are just not beneficial for an individual but it
can also create problems as well for a community and can also become a part of social affairs as well. Along
with this, it has also been analysed that, Reddit is also having same issues as well which could basically lead
people to loose interest towards a place which they were willing to visit. Thus, it is essential for both of the
web applications to look into these cases, prominently keeps an eye on the content which has been
developed for an individual as well. With the help of this, maximum benefits may also be given to
individuals as well. Apart from this, Singapore, China, Australia, USA, UK, Malaysia, South Africa and
India are some of the countries whose citizens mainly stays online on different social media. This is the
basic reason behind Tripadvisor and Reddit were specifically started getting feeds just after when the
organisations started their business at international level. Along with this, HK visa for Vietnamese passport
has also been improved as well.
Thus, it is essential for international airports of countries like Hong Kong, Bangkok to develop a range of
policies right on time in order to offer individuals with a good amount of benefits. Since, most of replies
from 1396 are specifically in favour of the HKIA and it is also helping them in making modifications as per
the requirements. But, if it has kept focus on a range of issues like ethical hackers are one of crucial barrier
for companies that are doing business on internet. This would be considered as one of the important subject
that needs to be searched in depth so that maximum benefits could be gained by them right on time. On the
other hand, if calculated than, it can easily be said that Reddit is having a good amount clients or subscribers
than Tripadvisor. This means that company would require to make alterations as per the needs of clients so
that maximum benefits could be gained by them. Along with this, One Solution for Everything is one of the
approach that is being followed by both of the web based organisations (Tripadvisor and Reddit) which
specifically helps in offering online website visitors to make payments, look at the hotels, look at the
reviews given by people that already visited the places. This approach is basically being followed by most of
companies that are looking forward to expand their business or to do other things as well. This approach
held by both of the companies (Tripadvisor and Reddit) in reaching to new heights. HKIA which is one of
the famous international airport, which is specifically consist with two terminals and both are consist with a
range of services that are helping HKIA in reaching to new heights. Tripadvisor is carrying a section where
visitors could share their experiences that they had after visiting the place. Along with this, booking tickets,
car rentals, sightseeing, hotels, and sometimes travel related shopping are some of factors that Tripadvisor
has included as its services. Moreover, for all these, an individual is not at all required to create a single
portal for each of them. Plus, people these days prefer everything at one place which makes this Tripadvisor
and other website companies to become more effective at modern business markettplace.
200 pages were scraped from Reddit while searching for “HK Airport transport”, with 1421 replies exported
in the form of .csv. Due to privacy issue, the location of the post cannot be identified. Yet, replies related to
the project topic are far less than those found in Tripadvisor. Posts ranging from the third runway debate to
memes can be found. Topic posted within ten years are downloaded, with an average timestamp of 3.65
years ago. Active users in Reddit are much more than Tripadvisor, however, the number of useful data
parsed is limited. On the other hand, if it is talked about HKIA, the information which is being carried out
from Tripadviser, it is showing a range of experiences which is being carried out through a proper
investigation, it shows that Hong Kong International Airport is basically real quick because most of the
stores at Airport were just opening up but there was a wide variety of them and an individual could easily
become able to find a fast souvenir such as packaged food and legos of China and so on.
Discussion
The preliminary findings suggested that Tripadvisor serves as an excellent platform to parse data from due
to its rich discussion on travel issues. While in Reddit, the number of discussion regarding travel issues in
Hong Kong International Airport transportation is trivial, making an insignificant contribution to data
processing. In the future plan, parsing data from Reddit could be considered taken out in the data collection
process. With the help of collected information, it can easily be said that right amount of information can be
carried out in specific time frame.
The data crawled from Tripadvisor has given a brief idea of the big picture for the final results: experience
and preferences found has a heavy weighting of US, UK and Australia travellers, followed by Southeast
Asians and China; where the Experience from Europe, Middle East and East Asia is missing. Future studies
can compare this result with the portion of travellers from different parts of the world to validate how
representing the results are. Based on the findings it can easily be said that Tripadvisor is one of the leading
travel and tourism website company which is rapidly helping its clients to have an in-depth knowledge or
look towards the hotels, restaurants and other places where they are looking forward to visit. Along with
this, using this site carry a range of benefits like Automation, where an individual would directly carry out a
good range of information which is specifically available instantly, without talking to anyone about the
same. From a business point of view, all the operations like bookings, information, billing, invoicing, and
every other thing is automated which leads to saving time and effort. In the same way, HKIA is also has kept
its focus on automation as well, which offers both visitors, flight attenders, clients and others in different
ways.
With the help of this, HKIA has become world famous and is continuously offering customers with a range
of services from stores to gaming zone. Based on a range of experiences shared on Reddit by visitors where
they mentioned that, Hong Kong Airport is a good for the most sections whether it is food court, shopping
area or any other as well. Efficient, clean and modern are some of factors that HKIA has kept its focus on.
Here, it is being found that the food court is being renovated choices of vendors will be limited as will be
seating as well. Limited choice for the time being. Rather expensive but not more so than other large
airports. Stores of large and famous organisations like Starbucks always stays long but just past it near the
rest rooms and smoking lounge is a bakery selling egg tarts, a Hong Kong speciality at reasonable prices and
makes great coffee at half of the prices of famous brands. Another reviewer has given the information that
price of water is high, which is actually not if it is compared with other famous international airports. On the
other hand, it has also been analysed that, both Tripadvisor and Reddit both of the web applications have
developed themselves in a way where their targeted clients could specifically, booking tickets, look at the
prices of the hotels, sightseeing, car rentals, and sometimes travel related shopping as well. This approach is
range of services that are helping HKIA in reaching to new heights. Tripadvisor is carrying a section where
visitors could share their experiences that they had after visiting the place. Along with this, booking tickets,
car rentals, sightseeing, hotels, and sometimes travel related shopping are some of factors that Tripadvisor
has included as its services. Moreover, for all these, an individual is not at all required to create a single
portal for each of them. Plus, people these days prefer everything at one place which makes this Tripadvisor
and other website companies to become more effective at modern business markettplace.
200 pages were scraped from Reddit while searching for “HK Airport transport”, with 1421 replies exported
in the form of .csv. Due to privacy issue, the location of the post cannot be identified. Yet, replies related to
the project topic are far less than those found in Tripadvisor. Posts ranging from the third runway debate to
memes can be found. Topic posted within ten years are downloaded, with an average timestamp of 3.65
years ago. Active users in Reddit are much more than Tripadvisor, however, the number of useful data
parsed is limited. On the other hand, if it is talked about HKIA, the information which is being carried out
from Tripadviser, it is showing a range of experiences which is being carried out through a proper
investigation, it shows that Hong Kong International Airport is basically real quick because most of the
stores at Airport were just opening up but there was a wide variety of them and an individual could easily
become able to find a fast souvenir such as packaged food and legos of China and so on.
Discussion
The preliminary findings suggested that Tripadvisor serves as an excellent platform to parse data from due
to its rich discussion on travel issues. While in Reddit, the number of discussion regarding travel issues in
Hong Kong International Airport transportation is trivial, making an insignificant contribution to data
processing. In the future plan, parsing data from Reddit could be considered taken out in the data collection
process. With the help of collected information, it can easily be said that right amount of information can be
carried out in specific time frame.
The data crawled from Tripadvisor has given a brief idea of the big picture for the final results: experience
and preferences found has a heavy weighting of US, UK and Australia travellers, followed by Southeast
Asians and China; where the Experience from Europe, Middle East and East Asia is missing. Future studies
can compare this result with the portion of travellers from different parts of the world to validate how
representing the results are. Based on the findings it can easily be said that Tripadvisor is one of the leading
travel and tourism website company which is rapidly helping its clients to have an in-depth knowledge or
look towards the hotels, restaurants and other places where they are looking forward to visit. Along with
this, using this site carry a range of benefits like Automation, where an individual would directly carry out a
good range of information which is specifically available instantly, without talking to anyone about the
same. From a business point of view, all the operations like bookings, information, billing, invoicing, and
every other thing is automated which leads to saving time and effort. In the same way, HKIA is also has kept
its focus on automation as well, which offers both visitors, flight attenders, clients and others in different
ways.
With the help of this, HKIA has become world famous and is continuously offering customers with a range
of services from stores to gaming zone. Based on a range of experiences shared on Reddit by visitors where
they mentioned that, Hong Kong Airport is a good for the most sections whether it is food court, shopping
area or any other as well. Efficient, clean and modern are some of factors that HKIA has kept its focus on.
Here, it is being found that the food court is being renovated choices of vendors will be limited as will be
seating as well. Limited choice for the time being. Rather expensive but not more so than other large
airports. Stores of large and famous organisations like Starbucks always stays long but just past it near the
rest rooms and smoking lounge is a bakery selling egg tarts, a Hong Kong speciality at reasonable prices and
makes great coffee at half of the prices of famous brands. Another reviewer has given the information that
price of water is high, which is actually not if it is compared with other famous international airports. On the
other hand, it has also been analysed that, both Tripadvisor and Reddit both of the web applications have
developed themselves in a way where their targeted clients could specifically, booking tickets, look at the
prices of the hotels, sightseeing, car rentals, and sometimes travel related shopping as well. This approach is
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
mostly liked by individuals because in modern era, no individuals would like to go at different sites and
needs to have every thing at one place.
Through this, they can save both time and money at same level. Along with this, it is also being discussed
that, nations like Sweden, Spain, Canada, Israel, Macau, Indonesia and more are some of the countries that
are not having a good frequency for both of the tourist related web applications. In present context, users of
these nations specifically had a few range of enquiries only like related to taxi, octopus card, airport-hotel
shuttle, and the transport modes through which they can visit from one place to another like Macau and
China. Based on the findings, Tripadvisor has stated that reviews are not posted to the website instantly, but
are subject to a verification process which considers the IP address and email address of the author, and tries
to detect any suspicious patterns or obscene or abusive language. This helps in filtering the comments,
which could arise a range of problems not even for the company but for different communities as well.
Future Plans
This study is still an on-going work. The following tasks are planned to be carried out in the rest of the
project.
A trial run will be carried out with the data scraped to test out the term-frequency analysis, concept links and
sentiment analysis. Next, The full web data of 1003 pages will be downloaded and follow the steps proposed
in the methodology to obtain the necessary information and hence visualise the findings. On the other hand,
website applications like Tripadvisor has kept its focus on security has has become the main reason for
every web application out there. Also, cashless payments plays the main role in success of any business
which allows users to book their tickets without a need to pay anyone in person. Therefore, it is required for
individuals to wisely select the options for making a travel & tourism web application a successful one.
Because one bad decision can lead to the loss of both time and money for any travel & tourism business.
Through this, Tripadvisor and Reddit may easily grab a good position at marketplace. Along with this, it has
also been analysed that Tripadvisor would also require to make alterations as per the requirements as well.
On the other hand, the data which is being carried out from Tripadvisor it is just showing a range of issues
that means a big picture has not been carried out for finalising the results: preferences and experiences found
has a heavy weighting of UK, US and other countries like Australia travellers, followed by Southeast Asians
and China; where the Experience from nations like Middle East, Europe, and East Asian countries and so on.
Future studies can compare this result with the portion of travellers from different parts of world to validate
how representing the results are for the place where an individual is going to visit the place. Based on the
website of TripAdvisor which basically mentioned that branding change had been planned for some time
and that changes began for the organisation on June 2011. Away with this, in the future as well, it is mainly
concerned that consumers might become a fool, by fraudulent posts since the entries could be made without
any form of verification," but recognised that TripAdvisor and Reddit needs to use advanced and highly
effective fraud systems that would help both of the organisations in removing fake contents related to HKIA
or any other situation as well. The research findings facilitates more nuanced discussions and encourage
future researches to include multiple methods in fully understanding of differences among conditions with
shared symptomatology.
needs to have every thing at one place.
Through this, they can save both time and money at same level. Along with this, it is also being discussed
that, nations like Sweden, Spain, Canada, Israel, Macau, Indonesia and more are some of the countries that
are not having a good frequency for both of the tourist related web applications. In present context, users of
these nations specifically had a few range of enquiries only like related to taxi, octopus card, airport-hotel
shuttle, and the transport modes through which they can visit from one place to another like Macau and
China. Based on the findings, Tripadvisor has stated that reviews are not posted to the website instantly, but
are subject to a verification process which considers the IP address and email address of the author, and tries
to detect any suspicious patterns or obscene or abusive language. This helps in filtering the comments,
which could arise a range of problems not even for the company but for different communities as well.
Future Plans
This study is still an on-going work. The following tasks are planned to be carried out in the rest of the
project.
A trial run will be carried out with the data scraped to test out the term-frequency analysis, concept links and
sentiment analysis. Next, The full web data of 1003 pages will be downloaded and follow the steps proposed
in the methodology to obtain the necessary information and hence visualise the findings. On the other hand,
website applications like Tripadvisor has kept its focus on security has has become the main reason for
every web application out there. Also, cashless payments plays the main role in success of any business
which allows users to book their tickets without a need to pay anyone in person. Therefore, it is required for
individuals to wisely select the options for making a travel & tourism web application a successful one.
Because one bad decision can lead to the loss of both time and money for any travel & tourism business.
Through this, Tripadvisor and Reddit may easily grab a good position at marketplace. Along with this, it has
also been analysed that Tripadvisor would also require to make alterations as per the requirements as well.
On the other hand, the data which is being carried out from Tripadvisor it is just showing a range of issues
that means a big picture has not been carried out for finalising the results: preferences and experiences found
has a heavy weighting of UK, US and other countries like Australia travellers, followed by Southeast Asians
and China; where the Experience from nations like Middle East, Europe, and East Asian countries and so on.
Future studies can compare this result with the portion of travellers from different parts of world to validate
how representing the results are for the place where an individual is going to visit the place. Based on the
website of TripAdvisor which basically mentioned that branding change had been planned for some time
and that changes began for the organisation on June 2011. Away with this, in the future as well, it is mainly
concerned that consumers might become a fool, by fraudulent posts since the entries could be made without
any form of verification," but recognised that TripAdvisor and Reddit needs to use advanced and highly
effective fraud systems that would help both of the organisations in removing fake contents related to HKIA
or any other situation as well. The research findings facilitates more nuanced discussions and encourage
future researches to include multiple methods in fully understanding of differences among conditions with
shared symptomatology.
Bibliography
Civil Aviation Department (CAD). (2018). Air Traffic Statistic. Retrieved from Civil Aviation Department:
https://www.cad.gov.hk/english/pdf/Stat%20Webpage.xlsx
Hong Kong Tourism Board. (2018). Hong Kong Tourism Board Annual Report 2017/18. Hong Kong: Hong
Kong Tourism Board.
Airport Authority Hong Kong. (n.d.). Retrieved from Hong Kong International Airport:
https://www.hongkongairport.com/en/
Airport Authority Hong Kong. (2018). Hong Kong International Airport Annual Report 2017/18. Hong
Kong.
Irfan, R., King, C. K., Grages, D., Ewen, S., Khan, S. U., Madani, S. A., . . . Li, H. (2015). A survey on text
mining in social networks. The Knowledge Engineering Review, 30(2), 157-170.
Hashimi, H., Hafez, A., & Mathkour, H. (2015). Selection criteria for text mining approaches. Computers in
Human Behavior, 51, 729-733.
Shen, C.-w., Chen, M., & Wang, C.-c. (2018). Analyzing the trend of O2O commerce by bilingual text
mining on social media. Computers in Human Behavior.
Gokasar, I., & Gunay, G. (2017). Mode choice behavior modeling of ground access to airports: A case study
in Istanbul, Turkey. Journal of Air Transport Management(59), 1-7.
Choo, S., You, S. I., & Lee, H. (2013). Exploring characteristics of airport access mode choice: a case study
of Korea. Transportation Planning and Technology, 36(4), 335-351.
Tam, M., Tam, M., & Lam, W. (2005). Analysis of airport access mode choice: a case study in Hong Kong.
Journal of the Eastern Asia Society for Transportation Studies, 6, 708-723.
Alhussein, S. N. (2011). Analysis of ground access modes choice King Khaled International Airport,
Riyadh, Saudi Arabia. Journal of Transport Geography, 19, 1361-1367.
Park, A., Conway, M., & Chen, A. T. (2017). Examining thematic similarity, difference, and membership in
three online mental health communities from reddit: A text mining and visualization approach.
Computers in Human Behavior, 78, 98-112.
Reddit. (2018). Reddit's Year in Review. Retrieved from Reddit: https://redditblog.com/2018/12/04/reddit-
year-in-review-2018/
NLTK Project. (n.d.). Natural Language Toolkit. Retrieved from NLTK 3.4 documentation:
https://www.nltk.org/
Civil Aviation Department (CAD). (2018). Air Traffic Statistic. Retrieved from Civil Aviation Department:
https://www.cad.gov.hk/english/pdf/Stat%20Webpage.xlsx
Hong Kong Tourism Board. (2018). Hong Kong Tourism Board Annual Report 2017/18. Hong Kong: Hong
Kong Tourism Board.
Airport Authority Hong Kong. (n.d.). Retrieved from Hong Kong International Airport:
https://www.hongkongairport.com/en/
Airport Authority Hong Kong. (2018). Hong Kong International Airport Annual Report 2017/18. Hong
Kong.
Irfan, R., King, C. K., Grages, D., Ewen, S., Khan, S. U., Madani, S. A., . . . Li, H. (2015). A survey on text
mining in social networks. The Knowledge Engineering Review, 30(2), 157-170.
Hashimi, H., Hafez, A., & Mathkour, H. (2015). Selection criteria for text mining approaches. Computers in
Human Behavior, 51, 729-733.
Shen, C.-w., Chen, M., & Wang, C.-c. (2018). Analyzing the trend of O2O commerce by bilingual text
mining on social media. Computers in Human Behavior.
Gokasar, I., & Gunay, G. (2017). Mode choice behavior modeling of ground access to airports: A case study
in Istanbul, Turkey. Journal of Air Transport Management(59), 1-7.
Choo, S., You, S. I., & Lee, H. (2013). Exploring characteristics of airport access mode choice: a case study
of Korea. Transportation Planning and Technology, 36(4), 335-351.
Tam, M., Tam, M., & Lam, W. (2005). Analysis of airport access mode choice: a case study in Hong Kong.
Journal of the Eastern Asia Society for Transportation Studies, 6, 708-723.
Alhussein, S. N. (2011). Analysis of ground access modes choice King Khaled International Airport,
Riyadh, Saudi Arabia. Journal of Transport Geography, 19, 1361-1367.
Park, A., Conway, M., & Chen, A. T. (2017). Examining thematic similarity, difference, and membership in
three online mental health communities from reddit: A text mining and visualization approach.
Computers in Human Behavior, 78, 98-112.
Reddit. (2018). Reddit's Year in Review. Retrieved from Reddit: https://redditblog.com/2018/12/04/reddit-
year-in-review-2018/
NLTK Project. (n.d.). Natural Language Toolkit. Retrieved from NLTK 3.4 documentation:
https://www.nltk.org/
1 out of 9
Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
© 2024 | Zucol Services PVT LTD | All rights reserved.