Data Mining Assignment: XL Miner, Association, Clustering Analysis

Verified

Added on  2020/04/01

|4
|1026
|46
Homework Assignment
AI Summary
This assignment solution delves into data mining, focusing on association rules and clustering techniques using XL Miner. Question 1 analyzes association rules, exploring concepts like confidence levels and redundancy, with examples of brush, nail polish, and bronzer purchases. Question 2 focuses on cluster formation using airlines data, comparing hierarchical and K-means clustering. It discusses the importance of data normalization and identifies three clusters: "Middle Class Travellers," "High Networth Flyers," and "Non-frequent Fliers." The solution highlights the differences between the two clustering methods and proposes targeted offers for each customer segment, aiming to enhance loyalty and encourage increased travel frequency. The assignment provides a comprehensive analysis of data mining principles and their practical application in customer segmentation and business strategy.
Document Page
DATA MINING
STUDENT ID:
[Pick the date]
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
DATA MINING
Question 1 (Association Rules)
The XL Miner has been used in order to obtain the output for Association rules. The list of rules
obtained considering 100 as the minimum transactions is illustrated as follows.
i) The three vital rules that are obtained from the XL Miner output referred to above are as shown
below.
Rule No. 1- The event of brush purchase is followed by the nail polish purchase. Also, the
confidence linked to this rule is 100%. This essentially refers to the conditional probability of
1 and hence indicates a certain event.
Rule No. 2- The event of nail polish purchase is followed by the brush purchase. Also, the
confidence linked to this rule is 63.22%. This essentially refers to the conditional probability
of 0.6322.
Rule No. 3- The event of nail polish purchase is followed by the bronzer purchase. Also, the
confidence linked to this rule is 59.19%. This essentially refers to the conditional probability
of 0.5919.
ii) Definition: A given rule would be considered as redundant only in the context of another rule if
for every dataset, the support and corresponding level is at minimum equal to the latter rule. In
the context of first couple of dozen rules, redundancy would be observed for rule no.16 and rule
no. 17. A redundancy situation is also visible for rule no. 2 but the confidence level tends to show
variation which is apparent from the above output.
The rules have limited utility in the sense that various patterns of purchase may be deciphered
using corresponding conditional probabilities and the same can then be used to derive
meaningful conclusions in the wake of the various theoretical framework and literature review in
place for the same. Also, at times, these rules extend support in extending the validity of the
pattern indicated which eventually could lead to useful information being derived.
iii) In the given case, the confidence level which is increased from 50% to 75% tends to cause
reduction of rules. Contrary to the earlier case where rules have a confidence level in excess of
50% were listed, in this case, only the rules which would have a confidence level greater than
75% would be listed in the XL Miner Output. Hence, only one rule which has a confidence level of
100% is stated.
Document Page
DATA MINING
Question 2
a) The cluster formation can be understood considering the relevant dendrogram which has been
obtained for the airlines data.
Three clusters are apparent taking a cutoff of distance not exceeding 1000 which is apparent from
the horizontal line test.
b) Data normalisation is critical for the given problem as in absence of the same, certain issues could
be potentially observed. Firstly, there would be a tendency of higher importance being given to
the variables having the larger scale and hence the distance computation could be adversely
impacted. Additionally, the results obtained would also be impacted by the mismatching scale as
hence to avoid the same, normalisation of data is a suggested practice.
c) As highlighted above regarding the formation of three clusters in the hierarchical clustering, the
labelling can be carried out as follows.
Cluster 1 – Named as “Middle Class Travellers”
Reason: Level of spending is greater than cluster 3 but less than cluster 2. Also, the other
characteristics in terms of balance, miles collected and travel frequency are greater than
cluster 3 but less than cluster 2.
Cluster 2- Named as “ High Networth Flyers”
Reason: These have the longest enrolment period coupled with the highest balance of points
available. Also, their transactions frequency is significantly higher than the other two
clusters. Thus, these are those customers who tend to fly on a frequent basis and have been
customers of the airlines since long.
Cluster 3- Named as “Non-frequent Fliers”
Reason: Their travel frequency as observed within the last 12 months is abysmally low. This
low frequency typically also has an impact on the other indicators that have been
considered.
Document Page
DATA MINING
d) The K-mean clustering has been performed on the given data with the aid of XL Miner and
suitable output presented.
The output regarding the three clusters from the K mean is quite different from that derived through
hierarchical clustering. The clusters derived do not match. Based on the characteristics summarised
above, it seems that the high networth fliers are indicated by Cluster 1 considering the balance and
flight transactions in the last 12 months. Also, the infrequent flier lot seems to belong to Cluster 2
which has the lowest balance along with flight transactions during the last 12 month period. The
differentiation of clustering obtained under the two processes is summarised in the following table.
Hence, based on the above it is apparent that the results obtained under the two clustering
techniques indicate significant difference.
e) The clusters to be targeted are as follows.
Cluster 2: These are frequent flyers and loyal customer base and hence essentially provide the
bulk of the business.
Offers: A high percentage of bonus miles if the number of yearly transactions tends to exceed 18.
Further, better loyalty benefits need to be provided so that these customers choose the given
airline before other competitors.
Cluster 3- Currently, these are infrequent travellers and hence present a huge opportunity to be
the future growth engine for the airlines business going ahead.
Offers: Special offers need to be offered so as to enhance their experience along with favourable
conversion of bonus points into air miles. Also, they should be encouraged to be included in the
loyalty program through the issue of card.
chevron_up_icon
1 out of 4
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]