Airline Customer Segmentation Analysis
VerifiedAdded on 2020/03/16
|11
|1312
|39
AI Summary
This assignment focuses on segmenting airline customers based on their transactional data. It utilizes both K-Means and hierarchical clustering algorithms to identify distinct customer groups. The analysis reveals insights into customer behavior, such as flight frequency, bonus transactions, and miles earned. Based on these segments, the assignment proposes targeted marketing offers for specific customer groups, aiming to enhance customer engagement and loyalty.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
Data Mining
Student Id and Name
[Pick the date]
Student Id and Name
[Pick the date]
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Question 1
XLMiner is used applying association rule on data variables. The inputs are as highlighted
below:
Number of transaction = 500
Number of variables = 7
Minimum confidence = 0.5 or 50%
XLMiner has generated the input and the part of lists of rules is as shown below:
1
XLMiner is used applying association rule on data variables. The inputs are as highlighted
below:
Number of transaction = 500
Number of variables = 7
Minimum confidence = 0.5 or 50%
XLMiner has generated the input and the part of lists of rules is as shown below:
1
(i) After analyzing the list of rules generated by XLMiner, the following conclusion has been
derived regarding the first three association rules (Ana, 2014).
Rule 1 – As per first association rule, it would be fair to conclude that the concerned person who
first purchases brush would then purchase nail polish with an estimated confidence of 100%.
Rule 2 - As per second association rule, it would be fair to conclude that the concerned person
who first purchases nail polish would then purchase brush with an estimated confidence of
63.22%.
Rule 3 - As per third association rule, it would be fair to conclude that the concerned person who
first purchases nail polish would then purchase bronzer with an estimated confidence of 59.20%.
(ii) Redundancy of rule is a potential issue in the association rules and hence such rules need to
be trimmed. A particular case of redundancy for the given data pertains to rule 2 which has
exactly the same support or lift ratio as witnessed for rule 1. The only difference is that the
confidence level for rule 2 is lower than the corresponding level for rule 1 which makes
rule 2 inferior than rule 1 (Zaki, 2000).
Additionally, the utility of the association rule lies in the fact that they can enable identification
of hidden associations prevalent in consumer buying behavior. However, in order to use the same
a balance between support and confidence is required. This is because if the support is high, then
the rules regarding rare items rule are not displayed. However, if the support is kept at a low
value, hence the rules generated are quite more which tends to undermine the end utility of these
and hence not recommended (Liebowitz, 2015).
(iii) In this case, the input is same only the minimum confidence has changed and become
0.75 in place of 0.50.
2
derived regarding the first three association rules (Ana, 2014).
Rule 1 – As per first association rule, it would be fair to conclude that the concerned person who
first purchases brush would then purchase nail polish with an estimated confidence of 100%.
Rule 2 - As per second association rule, it would be fair to conclude that the concerned person
who first purchases nail polish would then purchase brush with an estimated confidence of
63.22%.
Rule 3 - As per third association rule, it would be fair to conclude that the concerned person who
first purchases nail polish would then purchase bronzer with an estimated confidence of 59.20%.
(ii) Redundancy of rule is a potential issue in the association rules and hence such rules need to
be trimmed. A particular case of redundancy for the given data pertains to rule 2 which has
exactly the same support or lift ratio as witnessed for rule 1. The only difference is that the
confidence level for rule 2 is lower than the corresponding level for rule 1 which makes
rule 2 inferior than rule 1 (Zaki, 2000).
Additionally, the utility of the association rule lies in the fact that they can enable identification
of hidden associations prevalent in consumer buying behavior. However, in order to use the same
a balance between support and confidence is required. This is because if the support is high, then
the rules regarding rare items rule are not displayed. However, if the support is kept at a low
value, hence the rules generated are quite more which tends to undermine the end utility of these
and hence not recommended (Liebowitz, 2015).
(iii) In this case, the input is same only the minimum confidence has changed and become
0.75 in place of 0.50.
2
Minimum confidence = 0.75 or 75%
XLMiner has generated the input and the part of lists of rules is as shown below.
It would be fair to conclude that increase in the confidence percentage from 0.5 to 0.75, the
number of list of association rules displayed is decreased. This is because only the association
rules which has fall in the range above the selected minimum confidence percentage would
appear. Hence, the rules which display lower confidence percentage as compared with minimum
confidence percentage would be removed automatically through XLMiner. This could be
problematic since the rules not displayed may have high support levels and hence significant
(Ragsdale, 2014).
3
XLMiner has generated the input and the part of lists of rules is as shown below.
It would be fair to conclude that increase in the confidence percentage from 0.5 to 0.75, the
number of list of association rules displayed is decreased. This is because only the association
rules which has fall in the range above the selected minimum confidence percentage would
appear. Hence, the rules which display lower confidence percentage as compared with minimum
confidence percentage would be removed automatically through XLMiner. This could be
problematic since the rules not displayed may have high support levels and hence significant
(Ragsdale, 2014).
3
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Question 2
(a) In regards to calculate the total number of cluster formed from data dendrogram needs to
be prepared through XLMiner.
The dendrogram for the data variables which is used to calculate the total number of clusters
formed is highlighted below:
The above output was generated using the Ward method whereby three clusters were chosen
which have been obtained and the relevant output confirms the same if a horizontal line is drawn
at a distance of about 990, the dendogram would be intersected at three unique points reflecting
that three clusters have been obtained.
(b) The case when the standard normalization of data is not performed before conducting the
clustering analysis, the below highlighted issues can be incurred (Abramowics, 2013).
It reduces the overall accuracy of the result.
It would create distortion of distance between the centroids and would adversely impact
usage of these results.
Normalization would also transform the spherical clusters into elliptical clusters. This would
create problem for the respective clustering algorithm.
4
(a) In regards to calculate the total number of cluster formed from data dendrogram needs to
be prepared through XLMiner.
The dendrogram for the data variables which is used to calculate the total number of clusters
formed is highlighted below:
The above output was generated using the Ward method whereby three clusters were chosen
which have been obtained and the relevant output confirms the same if a horizontal line is drawn
at a distance of about 990, the dendogram would be intersected at three unique points reflecting
that three clusters have been obtained.
(b) The case when the standard normalization of data is not performed before conducting the
clustering analysis, the below highlighted issues can be incurred (Abramowics, 2013).
It reduces the overall accuracy of the result.
It would create distortion of distance between the centroids and would adversely impact
usage of these results.
Normalization would also transform the spherical clusters into elliptical clusters. This would
create problem for the respective clustering algorithm.
4
Scale difference would also create problem especially when the variable has significantly
high magnitude.
(c) The labeling of the clusters ought to be performed on the basis of the common
characteristics that these would display based on the hierarchical clustering which is
highlighted below (Ana, 2014).
Cluster 1
Key Observations:
Non-flight bonus transactions quite minimal (Lowest in all clusters)
Flight transactions quite minimal in the past one year
Balance in terms of miles eligible for award travel is the lowest for all clusters
Appropriate Label: “Middle Class Flyers”
Cluster 2
5
high magnitude.
(c) The labeling of the clusters ought to be performed on the basis of the common
characteristics that these would display based on the hierarchical clustering which is
highlighted below (Ana, 2014).
Cluster 1
Key Observations:
Non-flight bonus transactions quite minimal (Lowest in all clusters)
Flight transactions quite minimal in the past one year
Balance in terms of miles eligible for award travel is the lowest for all clusters
Appropriate Label: “Middle Class Flyers”
Cluster 2
5
Key Observations:
Non-flight bonus transactions quite substantial (next to cluster 3)
Flight transactions the highest in the past one year
Balance in terms of miles eligible for award travel is the highest for all clusters
Also, some have high miles qualification for top flight status
Appropriate Label: “High Networth Flyers”
Cluster 3
6
Non-flight bonus transactions quite substantial (next to cluster 3)
Flight transactions the highest in the past one year
Balance in terms of miles eligible for award travel is the highest for all clusters
Also, some have high miles qualification for top flight status
Appropriate Label: “High Networth Flyers”
Cluster 3
6
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Key Observations:
Non-flight bonus transactions are highest amongst all clusters.
However, flight transactions comparatively very low in the past one year
Balance in terms of miles eligible for award travel is significant though lower than cluster
2.
Appropriate Label: “Non-frequent Flyers”
(d) The XLMiner output in relation to K Means Clustering is indicated in the output listed
below.
There is parity in terms of the number of cluster formed. However, the characteristics on close
scrutiny would indicate that the underlying difference.
Characteristics of Cluster 1(K Means Clustering)
Highest miles eligible for award travel
Highest frequency of flight transactions in the year gone by
Highest miles counted for Top Flight status
In line with the above observations, it would be fair to consider that this cluster would be labeled
as “High Networth Flyers”. However, the comparison with hierarchical clustering output clearly
indicates the difference as in that case, it was cluster 2 that comprise of these flyers. Thus, it
would be fair to assume that the output of K-Means Clustering varies with hierarchical clustering
(Grossmann & Rinderle-Ma, 2015).
7
Non-flight bonus transactions are highest amongst all clusters.
However, flight transactions comparatively very low in the past one year
Balance in terms of miles eligible for award travel is significant though lower than cluster
2.
Appropriate Label: “Non-frequent Flyers”
(d) The XLMiner output in relation to K Means Clustering is indicated in the output listed
below.
There is parity in terms of the number of cluster formed. However, the characteristics on close
scrutiny would indicate that the underlying difference.
Characteristics of Cluster 1(K Means Clustering)
Highest miles eligible for award travel
Highest frequency of flight transactions in the year gone by
Highest miles counted for Top Flight status
In line with the above observations, it would be fair to consider that this cluster would be labeled
as “High Networth Flyers”. However, the comparison with hierarchical clustering output clearly
indicates the difference as in that case, it was cluster 2 that comprise of these flyers. Thus, it
would be fair to assume that the output of K-Means Clustering varies with hierarchical clustering
(Grossmann & Rinderle-Ma, 2015).
7
e) The clusters chosen for target owing to their current contribution and future potential are
cluster 2 & 3 as per the hierarchical clustering. The offer for cluster 2 would involve incentive
for higher use of frequent flyer card issued by the airline leading to higher reward points. Also,
the bonus miles provided could be linked to a threshold annual check-ins.
For cluster 3, the incentive in the form of higher rewards point needs to be outlined so that the
customer uses frequent flyer card and the bonus points utilized for flight transactions.
8
cluster 2 & 3 as per the hierarchical clustering. The offer for cluster 2 would involve incentive
for higher use of frequent flyer card issued by the airline leading to higher reward points. Also,
the bonus miles provided could be linked to a threshold annual check-ins.
For cluster 3, the incentive in the form of higher rewards point needs to be outlined so that the
customer uses frequent flyer card and the bonus points utilized for flight transactions.
8
References
Abramowics, W. (2013) Business Information Systems Workshops: BIS 2013 International
Workshops (5th ed.). New York: Springer.
Ana, A. (2014) Integration of Data Mining in Business Intelligence System (4th ed.). Sydney:
IGA Global.
Grossmann, W. & Rinderle-Ma, S. (2015) Fundamentals of Business Intelligence (2nd ed.). New
York: Springer.
Liebowitz, J. (2015) Business Analytics: An Introduction (2nd ed.). New York: CRC Press.
Ragsdale, C. (2014) Spread sheet Modelling and Decision Analysis: A Practical Introduction to
Business Analytics (7th ed.). London: Cengage Learning.
Zaki, M.J.(2000), Generating non-redundant association rules. In: Proceeding of the ACM
SIGKDD, pp. 34–43
9
Abramowics, W. (2013) Business Information Systems Workshops: BIS 2013 International
Workshops (5th ed.). New York: Springer.
Ana, A. (2014) Integration of Data Mining in Business Intelligence System (4th ed.). Sydney:
IGA Global.
Grossmann, W. & Rinderle-Ma, S. (2015) Fundamentals of Business Intelligence (2nd ed.). New
York: Springer.
Liebowitz, J. (2015) Business Analytics: An Introduction (2nd ed.). New York: CRC Press.
Ragsdale, C. (2014) Spread sheet Modelling and Decision Analysis: A Practical Introduction to
Business Analytics (7th ed.). London: Cengage Learning.
Zaki, M.J.(2000), Generating non-redundant association rules. In: Proceeding of the ACM
SIGKDD, pp. 34–43
9
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
10
1 out of 11
Related Documents
Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
© 2024 | Zucol Services PVT LTD | All rights reserved.