Data Mining Assignment: XLMiner, Dendrogram, and K-Means Analysis

Verified

Added on  2020/03/16

|9
|840
|46
Homework Assignment
AI Summary
This data mining assignment solution provides a detailed analysis of association rules and clustering techniques. The solution begins with an interpretation of association rules generated by XLMiner, highlighting conditional probabilities and the concept of rule redundancy. It then explores hierarchical clustering using a dendrogram, discussing the sensitivity of the method to extreme values and the identification of sub-clusters. The assignment further examines K-Means clustering, comparing its output with hierarchical clustering and providing cluster labeling. The solution also addresses the impact of changing the minimum confidence level in association rule mining. The document concludes with references to relevant research papers and textbooks on data mining concepts.
Document Page
Data Mining
Student Id and Name
[Pick the date]
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Question 1
XLMiner Output
(i) The interpretation of the required rules is highlighted below.
Rule 1 – A conditional probability of 1 is observed for a customer purchasing nail polish
provided that brush has already been purchased.
Rule 2 - A conditional probability of 0.6322 is observed for a customer purchasing brush
provided that nail polish has already been purchased (Liebowitz, 2015).
1
Document Page
Rule 3 - A conditional probability of 0.5919 is observed for a customer purchasing bronzer
provided that nail polish has already been purchased
(ii) The redundancy of the rule is adjudged on the basis of ancestor rule and the relevant
support level for the same. This may be demonstrated for the given data through the use of
rule 2. The ancestor relationship is rule 1 whose support measured through lift ratio is the
same for rule 2. Hence, the inclusion of rule 1 potentially renders rule 1 as redundant (Zaki,
2000).
The given rules need to be accessed keeping in mind the underlying support and confidence
which are both imperative. Typically higher support implies greater importance but it is expected
that the confidence level is also high. Rules having both factors high tend to offer immense
utility. Besides, the consideration of minimum confidence level and support is done keeping the
end objective in mind (Liebowitz, 2015).
(iii) The relevant output when the minimum confidence level is changed from 50% to 75% is
shown below.
2
Document Page
One obvious effect of increasing the minimum confidence level is that limited rules appear since
the threshold confidence level is not pegged at 75% instead of 50% and hence rules which had
confidence level between 50% and 75% would not be highlighted. However, choosing an
excessively high confidence level can be problematic as it might end up sacrificing certain rules
which enjoy a high support but lack the requisite confidence. (Liebowitz, 2015).
Question 2
(a) Dendrogram
While running the clustering using Ward’s method, the number of clusters was chosen as 3 and
hence from the above dendogram three clusters are identified if a horizontal line is drawn just
below distance of 1000. Further, the number of sub-clusters is quite a lot as apparent from the
diagram (Rouse, 2004).
3
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
(b) Hierarchical clustering is quite sensitive to presence of extreme values and thereby non-
normalisation would have an adverse impact on the underlying accuracy and utility of the
output obtained. This would be particularly visible in the form of wrong distance
computation between the centroids which in turn would impact the cluster formation
process and thus undermine the whole process (Berkhin, 2015).
(c) Cluster 1
For this cluster, both flight transactions and bonus non-flight transactions in the recent past have
been low. Also, there are no miles that are being counted for Topflight status representing that
these are not the premium customers. Thus, these can be termed as “Middle Class Flyers (Zaki,
2000)”.
Cluster 2
4
Document Page
For this cluster, both flight transactions and bonus non-flight transactions in the recent past have
been high. Also, there is high availability of balance miles that are eligible for award travel.
Thus, these can be termed as “High Networth Flyers”.
Cluster 3
For this cluster, there is high concentration of non-flight transactions but the flight transactions
continue to lag. Also, the balance miles eligible for award are significant and hence it would be
apt to categorize these as “Infrequent Flyers”.
(d) K Means Clustering Output
5
Document Page
Number of Clusters formed: 3
Cluster 1: “High Networth Flyers” – High flight transactions, high qual_miles, high balance
Cluster 2: “Middle Class Flyers” – Low flight and non-flight transactions, low balance
Cluster 3: “Infrequent Flyers” – Obvious aberration between flight and non-flight transactions
Based on the above, it would be appropriate to point that the respective cluster labeling does not
match with hierarchical clustering and thus, the output differs for the two (Zaki, 2000).
e) The target clusters are Cluster 2 and Cluster 3 with the respective offers summarised below.
6
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
References
Berkhin, P. (2015). Survey of clustering Data Mining Techniques. Accrue software, Inc. 123-47.
https://www.cc.gatech.edu/~isbell/reading/papers/berkhin02survey.pdf
Liebowitz, J. (2015) Business Analytics: An Introduction (2nd ed.). New York: CRC Press.
Rouse, M. (2004) Association Rules in Data Mining. Retrieved from
http://searchbusinessanalytics.techtarget.com/definition/association-rules-in-data-mining
Zaki, M.J. (2000), Generating non-redundant association rules. In: Proceeding of the ACM
SIGKDD, pp. 34–43
7
Document Page
8
chevron_up_icon
1 out of 9
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]