Data Mining Assignment: Rules, Clustering, and XL Miner Analysis

Verified

Added on 2020/03/16

AI Summary

This data mining assignment delves into association rules, rule redundancy, and the application of XL miner with a minimum confidence level of 75%. It explores the significance of lift ratio and confidence levels in evaluating association rules. The assignment also covers hierarchical clustering using dendrograms, emphasizing the need for data normalization to ensure accurate centroid distance computations and cluster formation. It involves cluster labeling based on characteristics like flight and non-flight transactions and balance. K-means clustering is also discussed, comparing its classification with hierarchical clustering. Finally, the assignment outlines strategies for targeting specific clusters with tailored offers, such as providing reward points to frequent flyers and those with high bonus point usage.

Data Mining
Student Id and Name
[Pick the date]

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Table of Contents
Question 1........................................................................................................................................2
(i) Association Rules.................................................................................................................2
(ii) Rule Redundancy..............................................................................................................3
(iii) XL miner (Minimum confidence = 75%)............................................................................3
Question 2........................................................................................................................................4
(a) Dendrogram......................................................................................................................4
(b) Need for Normality...........................................................................................................5
c) Cluster Labeling.....................................................................................................................5
d) K Means Clustering...............................................................................................................7
e) Target Clusters and Offers.....................................................................................................7
References........................................................................................................................................8
1

Question 1
(i) Association Rules
Rule 1 – The customer who would purchase a brush would also make a make a purchase of nail
polish with an associated conditional probability of 1 (Zaki, 2000).
Rule 2 - The customer who would purchase a nail polish would also make a make a purchase of
brush with an associated conditional probability of 0.6322.
2

Rule 3 - The customer who would purchase a nail polish would also make a make a purchase of
bronzer with an associated conditional probability of 0.5919.
(ii) Rule Redundancy
Association rules at times tend to lead to redundant rules and these are those whose
underlying support levels can be predicted based on the rule that acts as the ancestor. An
illustrative example for the given output would comprise of Rule 2. The support
associated is same for both the rules i.e. 1 and 2 but the superior is rule 1 based on a
higher confidence. Thus, rule 2 is left redundant by rule 1 (Abramowics, 2013).
For analyzing the utility of the association rule, the following factors are critical (Rouse, 2004).
 Lift ratio (Highlights Support)
 Confidence level (Highlights Confidence)
Rules which tend to have higher values in either of these departments or both can be potentially
useful in order to convey useful information about the consumer buying pattern (Ana, 2014).
(iii) XL miner (Minimum confidence = 75%)
3

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

The output above is indicative of the primary effect that hiking the confidence interval has in
terms of limiting the rules captured in the output. This happens as the output of the association
rules would only highlight rules which tend to meet the threshold confidence level condition.
Thus, it is advisable that this level should not be too high or rules with high lift ratio may not
show in the output owing to low confidence (Rouse, 2004).
Question 2
(a) Dendrogram
While carrying on the above clustering process, the number of clusters chosen was three and
thus, three clusters have been formed in line with the above highlighted dendrogram.
4

(b) Need for Normality
The absence of data normalisation could adversely impact the validity of the result obtained.
This is because use of raw data could lead to mistakes in the centroid distance computation
owing to the effect of scale which could prompt artificially high weight for a variable having a
higher scale. Further, the output would also not be accurate since the cluster formation process is
driven by the underlying centroid distance which would be erroneous. Hence, data normalisation
is recommended.
.
c) Cluster Labeling
Cluster1
Characteristics: 1) Flight transactions quite less 2) Non-flight transactions also least amongst
the clusters 3) Balance also seem to be least amongst the three clusters 4) Moderate spending
power apparent (Libowiz, 2015).
Label: “Middle Class Flyer”
Cluster 2
5

Characteristics: 1) Flight transactions quite high 2) Similar pattern visible for non-flight bonus
transactions 3) Balance value highest 4) High spending power visible
Label: “High Networth Flyer”
Cluster 3
Characteristics: 1) Flight transactions very low 2) However, non-flight bonus transactions very
high 3) Moderately high balance
6

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Label: “Infrequent Flyer”
d) K Means Clustering
The appropriate classification for the three clusters indicated above is as follows.
It is apparent that the clustering classification derived above does not confer to the clustering
classification observed for hierarchical clustering. Thus, difference in pattern does exist for the
clustering techniques deployed (Berkhin, 2015).
e) Target Clusters and Offers
Cluster 2 would be one of the targets. Offer would be provided in the form of greater reward
points if transactions are done using the credit card issued to frequent flyers. Cluster 3 would
also be targeted owing to their sheer size and opportunity. Offer would be provided in the form
of greater reward points provided more than 50% bonus points are deployed in flight transactions
annually (Huang, 2014).
7

References
Abramowics, W. (2013) Business Information Systems Workshops: BIS 2013 International
Workshops (5th ed.). New York: Springer.
Ana, A. (2014) Integration of Data Mining in Business Intelligence System (4th ed.). Sydney:
IGA Global
Berkhin, P. (2015). Survey of clustering Data Mining Techniques. Accrue software, Inc. 123-47.
https://www.cc.gatech.edu/~isbell/reading/papers/berkhin02survey.pdf
Huang, Z. (2014). Clustering Large Data Sets with Mixed Numeric and Categorical Values.
CSIRO Mathematical and Information Sciences. 16(2), 45-78.
https://grid.cs.gsu.edu/~wkim/index_files/papers/kprototype.pdf
Liebowitz, J. (2015) Business Analytics: An Introduction (2nd ed.). New York: CRC Press.
Rouse, M. (2004) Association Rules in Data Mining. Retrieved from
http://searchbusinessanalytics.techtarget.com/definition/association-rules-in-data-mining
Zaki, M.J.(2000), Generating non-redundant association rules. In: Proceeding of the ACM
SIGKDD, pp. 35–143
8