Data Mining Analysis and Clustering

Verified

Added on 2020/03/16

AI Summary

This assignment delves into data mining concepts, focusing on association rule mining and hierarchical clustering. It analyzes example output from association rule generation, discussing the impact of confidence levels on rule selection. The assignment further explores hierarchical clustering visualized through a dendrogram, identifying three distinct customer clusters based on travel patterns. Finally, it proposes targeted marketing strategies for each cluster using data-driven insights.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.

DATA MINING
STUDENT ID:
[Pick the date]

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

DATA MINING
Question 1
The output for association rules is shown below.
i) Rule 1: If an individual buys brush, then nail polish is also purchased.
Confidence level: 100%
Rule 2: If an individual buys nail polish, then brush is also purchased.
Confidence level: 63.22%
Rule 3: If an individual buys nail polish, then bronzer is also purchased.
Confidence level: 59.19%
ii) One instance of a redundant rule is rule 2 as the rule 1 has also established the relation between
nail polish and brush and has a higher confidence level. The utility of the given rules would be
essentially assessed based on the underlying confidence levels which would communicate the
underlying probability of the rule. For instance, rule no.1 in the given case is highly useful as the
confidence level is 100% and hence it implies the purchase of brush would be followed by a
purchase of nail polish (Ana, 2014).
Also, the underlying utility of the rules is based on namely two parameters.
 Confidence Level
 Lift Ratio
Ideally, both of them should be high for higher utility but even if one of these is high, then the
association rule under consideration may be significant. Thus, based on these two aspects the
relevant information about the patterns can be made available (Abramowics, 2013).
iii) Relevant Output:

DATA MINING
The chosen confidence level impacts the number of association rules displayed. This is because only
those rules would be displayed which would enjoy a confidence level that is higher than the stated
confidence level. Hence, for the given case, the output would consist of only one rule which has a
confidence level in excess of 75%. Also, this may have adverse consequences as some rules having
high lift ratio may get discarded (Shumueli et. al., 2016).
Question 2
a) The dendrogram for the given data is as highlighted below.
There are three main clusters that are visible in the above dendogram which become apparent at a
distance cutoff value of 1000.

DATA MINING
b) Absence of data normalisation could lead to mismatching of the scale and resultant distortion in
the end results. Also, without data normalisation, the underlying distance computation can face
accuracy issues as certain variables having larger magnitude and scale would tend to undermine
the accuracy. Thus, it is always recommended that clustering should be carried out with data
normalisation to ensure that the result obtained is accurate (Ana, 2014).
c) Cluster 1
Characteristics: Low transactions (both flight and non-flight), also low spending indicated by the
balance and qual_miles
Name: “Middle Class Travellers”
Relevant Output:
Cluster 2
Characteristics: High frequency of flight transactions. Also, the balance of miles eligible for
awards is quite high.
Name: “High Networth Flyers”
Relevant Output:

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

DATA MINING
Cluster 3
Characteristics: Very low frequency of flight transactions and travelling indicating their flying is
quite infrequent. Also, coupled with high non-flight bonus transactions.
Name: “Non-frequent Flyers”
Relevant Output:
d) K-Mean Clustering Output

DATA MINING
The cluster formation is clearly different for K mean as compared to hierarchical clustering. Consider
for example cluster 1 above. For all the parameters, the value is the highest. The most noticeable
aspect is the average number of flight transactions in the last 12 months which are quite high. This is
the high networth flyer group which for hierarchical clustering forms the cluster 2. Hence even
without comparing other clusters, the difference in pattern can be confirmed (Ragsdle, 2014).
e) The targeted clusters must be no. 2 and no. 3. Cluster no. 2 are essentially the most loyal
customers and hence they should be served well since they are also the most frequent travellers.
Further, cluster no. 3 need provided incentives so as to increase their travelling via flight instead
of other means.
Cluster 2 offers: For exceeding a fixed number of transactions, higher air miles should be provided
so that the loyalty further increases and the airlines is the preferred one for the customers.
Cluster 3 offers: Special discount and benefit must be provided for first time fliers so that they
can experience and also in the lean season special offers should be provided to enhance
occupancy and attract these customers.
References

DATA MINING
Abramowics, W. (2013) Business Information Systems Workshops: BIS 2013 International
Workshops (5th ed.). New York: Springer.
Ana, A. (2014) Integration of Data Mining in Business Intelligence System (4th ed.). Sydney:
IGA Global
Ragsdale, C. (2014) Spread sheet Modeling and Decision Analysis: A Practical Introduction to
Business Analytics (7th ed.). London: Cengage Learning.
Shumueli, G., Bruce, C.P., Yahav, I., Patel, R. N., Kenneth, C., & Lichtendahl, J. (2016) Data
Mining For Business Analytics: Concepts Techniques and Application (2nd ed.).London:
John Wiley & Sons.