Data Mining Assignment - Analysis of Association Rules and Clusters

Verified

Added on 2020/03/16

AI Summary

This data mining assignment explores association rules and clustering techniques. The assignment presents an analysis of association rules, highlighting confidence levels and rule redundancy. It also includes a dendrogram analysis, discussing cluster formation and the impact of normalization. Furthermore, the document covers K-Means clustering, comparing the results with hierarchical clustering and identifying clusters based on flight transaction data. The conclusion provides targeted offers based on cluster analysis. This resource, available on Desklib, offers detailed solutions and explanations for students studying data mining and related subjects, providing valuable insights into practical applications and theoretical concepts within the field of data science and big data.

DATA MINING
STUDENT ID:
[Pick the date]

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

DATA MINING
Question 1
Output(Association Rules)
i) Interpretation
The rule 1 highlights at 100% confidence level that purchase of brush is followed by purchase of
nail polish. The rule 2 highlights at 63.22% confidence level that purchase of nail polish is
followed by purchase of brush. The rule 3 highlights at 59.19% confidence level that purchase of
nail polish is followed by purchase of bronzer.
ii) Rule redundancy tends to occur for a given rule with regards to other when same support and
confidence is extended for every dataset by the former as the latter. In case of the given data, for
the rule 2, redundancy situation is observed in respect of rule 1, however, the level of confidence
for the rule 2 is comparatively lesser than rule 1.
iii) The minimum confidence level chosen tends to influence the number of rules that are displayed
in the output. For instance, for the output outlined above, rules with a confidence level greater
than 50% are highlighted. Thus, if the minimum confidence level is now increased to 75%, then
out of the above, only the first rule would be outlined since only this tends to fulfil the condition
of having a confidence level of atleast 75%.
Question 2
a) Dendogram Output

DATA MINING
For the above dendogram, assuming a cut off distance of 1,000, there would be three clusters that
would be observed besides a number of smaller sub-clusters.
b) Absence of normalisation would give rise to the following issues.
 The distance computation would be inaccurate unless equal weights are given to the
variables of interest.
 The largest scale would tend to dominate the measure and hence to eliminate this effect,
normalisation is critical.
c) The various clusters are named in the following table.
d) K-Mean Clustering
Observation: Cluster 1 – High Net-worth Flyers owing to the average flight transactions exceeding 15 in
the last 12 months.
 Cluster 2- Middle Class Flyers owing to non-flight bonus transactions and miles earned in this
regard being the lowest.

DATA MINING
 Cluster 3 – Non-Frequent Flyers owing to non-flight bonus transactions and miles earned in
this regard being quite significant while the flight transactions being quite low.
Conclusion: Results obtained from the K- Means clustering does not resemble the corresponding
result from hierarchical clustering.
e) The clusters to be targeting along with the requisite offers are briefed as follows.