Data Mining Assignment: Association Rules and XL Miner Output

Verified

Added on 2020/04/07

AI Summary

This data mining assignment analyzes association rules and clustering techniques. Question 1 explores association rules, including confidence intervals and lift ratios, with an analysis of rule redundancy and the impact of confidence level changes. Question 2 focuses on XL Miner output, detailing the process of K-Means and hierarchical clustering, discussing the importance of data normalization, and comparing the outputs of the two clustering methods. The assignment also provides insights into cluster labeling and potential targeting strategies based on the clustering results, along with references to relevant literature.

DATA MINING
STUDENT ID:
[Pick the date]

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

DATA MINING
Table of Contents
Question 1..................................................................................................................................1
Question 2..................................................................................................................................2
Reference....................................................................................................................................4

DATA MINING
Question 1
The association rules for 50% confidence interval are outlined below.
i) The first three rules interpretation is required. The first rule reflects with 100%
confidence that if the antecedent item brush is purchased, then consequent item nail
polish is also purchased. The second rule reflects with 63.22% confidence that if the
antecedent item nail polish is purchased, then consequent item brush is also
purchased. The third rule reflects with 59.20% confidence that if the antecedent item
nail polish is purchased, then consequent item bronzer is also purchased (Shumueli et,
al., 2016).
ii) Considering the first 24 rules for the given data, it is observed that rule redundancy is
apparent for Rule 2 when seen in context with the Rule 1. Hence, rule 2 seems to be
redundant in comparison with rule 1.
The rule utility would be accessed considering the lift ratio which essentially captures the
importance of the underlying rule. As a result, the rules are arranged in accordance of the
decreasing lift ratio and not the confidence interval. However, confidence interval is also
imperative since it highlights the support available for a particular rule which provide vital
information related to customer behaviour (Liebowitz, 2015).

DATA MINING
iii)If there is increase in the minimum confidence level from 50% to 75%, then the number of
rules that are highlighted in the association rule output would be reduced. This is because
only the rules that qualify the minimum threshold of confidence level would be
considered. Based on the output listed above, this would only have a single rule which
tends to satisfy the minimum confidence level of 75%.
Question 2
a) XL Miner Output
Based on the above output and the inputs given, the total cluster number derived is three
which also becomes apparent assuming a cut-off at distance of 1000 from where a horizontal
would intersect the dendrogram at three points.
b) In the above output, the data was normalised. However, if it was not done that the
accuracy of the distance computation between centroids would be compromised as the
scale effect would lead to unfair distribution of weights. Further, a particular parameter
on account of a higher scale may become dominant thus leading to incorrect output
(Grossmann & Rinderle, 2015).

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

DATA MINING
c) The three clusters for the given data may be labelled in the manner highlighted below.
d) The useful output for K-Means clustering as obtained from the XL Miner output is as
follows.
Even though the main clusters are three but the output would be considered consider if there
is a corresponding matching of the clusters.
Clearly, this does not happen today. Take any cluster for instance, let us say Cluster 2.
Cluster 2 for hierarchical clustering belongs to the high networth flyers. However, Cluster 2
for K Means clustering cannot possibly belong to high networth flyers. This is apparent on
account of the lowest balance that they have. Also, the flight transaction frequency for this
cluster is quite dismal which seems to be Middle class Flyer. Hence, differences are there in
the two clustering outputs (Liebowitz, 2015).
e) The targeting of clusters along with the potential offers are reflected as follows.

DATA MINING

DATA MINING
Reference
Grossmann, W. & Rinderle- Ma, S. (2015) Fundamentals of Business Intelligence (2nd ed.).
New York: Springer.
Liebowitz, J. (2015) Business Analytics: An Introduction (2nd ed.). New York: CRC Press.
Shumueli, G., Bruce, C.P., Yahav, I., Patel, R. N., Kenneth, C., & Lichtendahl, J. (2016)
Data Mining For Business Analytics: Concepts Techniques and Application (2nd
ed.).London: John Wiley & Sons.