logo

Analyzing Airline Quality with RapidMiner: A Predictive Analytics Study

   

Added on  2023-06-11

11 Pages1909 Words69 Views
MIS772 Predictive Analytics Assignment A2
1 of 11

MIS772 Predictive Analytics Assignment A2
Executive summary (one page)
In this assignment we investigate the usefulness of the data collected by AQA. The data collected by AQA towards
analysing the quality of the airlines. Rapidminer is used to analyse the dataset. The overall rating of the airlines has
been taken as the response variable. Other quantitative data on ratings (seat comfort, cabin staff, food beverages,
inflight entertainment, value money and recommended) collected by AQA have been taken as the predictor variable.
Three models have been used to check the predictability of the dataset, regression, Neural Net and decision tree.
Business Problem
Aim 1: Which amongst the independent variables is / are the most important driver of overall rating of an airline.
Solution to Business Problem
From the analysis it is found that the decision tree model is better able to predict the dataset. The absolute error of
the decision tree model is very near to 1. Moreover, it is seen that “recommendation” is the most important driver for
overall rating of an airline.
2 of 11

MIS772 Predictive Analytics Assignment A2
Data exploration and preparation in RapidMiner (one page)
The analysis of the dataset in RapidMiner shows that the variables are either polynomial or integer. The information
provided by AQA contains various information regarding the quality of service provided by Airlines. The information
regarding the name of the airline contains no missing data. From figure 2 it is found that most of the travellers have
flown in economy class. The least number of travellers have flown first class. There is presence of missing data in
the ratings provided by the customers. The information of ground service and wifi connectivity contains 39193 and
40831 missing data. As such it was thought prudent not to select the factors.
For the present analysis the name of the airlines, the ratings (except ground service and wifi connectivity) and
recommendation were selected. The overall rating is selected as the dependent variable while the other ratings
(except ground service and wifi connectivity) and recommendation are the independent variables.
The overall rating provided to an airline has a scale from 1 to 10. The ratings on seat comfort, cabin staff, food and
beverages and inflight entertainment have a scale of 1 to 5. The recommendation of an airline is in the form of wither
1 or 0. All missing data was imputed with their averages.
After the data was replaced from figure 1 it is found that the average overall rating of an airline is 6.035 with a
standard deviation of 3.033. The average ratings of seat comfort, cabin staff and value money are 3.077, 3.260 and
3.158 respectively. Food and beverages and value money ratings are found to be 2.844 and 2.295 respectively.
The overall rating is plotted as a histogram. The histogram shows that the highest frequency of travellers of provided
a rating of 6 overall to all the airlines (see figure 2). The lowest frequency of travellers has given a rating of 4 (see
figure 2).
Figure 1: Overall statistics for ratings
Figure 2: Bar chart of travellers, histogram of overall rating
3 of 11

MIS772 Predictive Analytics Assignment A2
Discovering Relationships and Data Transformation in RapidMiner (one page)
Clustering is a process of segregating the dataset into groups with similar characters. The airlines of the world have
been clustered on the basis of overall rating and recommendations of the travellers. K-mean clustering process is
used for the cluster analysis. K-mean clustering process is the simplest of the process. The process of clustering is
depicted in Figure 3.
Figure 3: Process of clustering
The number of centroids has to be evaluated such that the Euclidian distance is the least. Table 1 presents the
relation between clusters and Average Euclidian distance. From the table it is found that having 7 clusters provide
the best results.
Table 1: Relation of No. of clusters to average Euclidian Distance
No. of Clusters Average Euclidian Distance
2 3.429
3 2.799
4 2.310
5 2.171
6 1.985
7 1.863
8 1.865
4 of 11

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Predictive Analysis for Airport Quality Agency
|16
|1535
|365

Predictive Analysis - Assignment
|25
|1737
|200

Predictive Analytics for Seat Reviews on Different Airlines
|8
|1013
|324

Predictive Analytics for Passenger Recommendations in Airport Lounges
|19
|2372
|382