Data Mining Assignment: Association, Clustering & XLMiner Analysis

Verified

Added on 2020/03/16

AI Summary

This assignment solution delves into data mining techniques, focusing on association rules and clustering using XLMiner. The first part examines association rules, addressing the issue of redundant rules and evaluating rules based on lift ratio and confidence levels. The second part explores clustering, including dendrogram analysis and K-Means clustering. The impact of data normalization and the differences in cluster formation between hierarchical and K-Means clustering are discussed. The solution provides an analysis of customer behavior and offers insights for targeted marketing strategies. The assignment demonstrates the application of these data mining methods to extract meaningful patterns and insights from datasets.

DATA MINING
Student id
[Pick the date]

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Table of Contents
Question 1........................................................................................................................................2
Question 2........................................................................................................................................4
Reference.........................................................................................................................................7
1 | P a g e

Question 1
Association rule
XLMiner has been taken into consideration to employ association rules on provided dataset. The
derived output of XLMiner is shown below. In first case, minimum confidence is adopted as
50%.
(i) The first three rules is highlighted below (Ana, 2014)
Rule 1 – It can be said with 100% confidence that when an individual buys brush, then nail
polish would also be bought by the said individual.
2 | P a g e

Rule 2 - It can be said with 63.21% confidence that when an individual buys nail polish, then
brushes would also be bought by the said individual.
Rule 3 - It can be said with 59.19 % confidence that when an individual buys nail polish, then
bronzer would also be bought by the said individual.
(i) The redundancy of rules is a key issue which undermines the accuracy of the
association rules. Hence, it is required that redundant rules should be eliminated. As
the name suggests, these are simply those which tend to reflect the same rule as
another rule and more so with similar support. A potential redundancy situation for
the given case seems to be faced by Rule 2 which is rendered redundant by rule 1
(Shumueli, et. al., 2016).
The association rules underlying utility assessment is based on the following two parameters.
 Lift Ratio
 Confidence Level
The higher lift ratio essentially reflects the importance of the given rule. The confidence level
reflects the chances of the consequent given the antecedent takes place. Hence, combined
together, these play a pivotal role in determining the behavior of the target customers
(Abramowics, 2013).
(ii) In second case, minimum confidence is taken as 75%. The XLMiner output is shown
below.
3 | P a g e

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

It is apparent from the XLMiner output that only one rule has appeared which has 100%
confidence. This is because the rules which have confidence % lower than 75% have already
been eliminated by XLMiner. Hence, significant reduction of the rules has been taken place
when the minimum percentage is increased to 75% (Abramowics, 2013).
4 | P a g e

Question 2
(a) For the given set of data, the dendrogram is made with the help of XLMiner and the
output is shown below:
After taking the threshold distance as 1000, there are only three major clusters are generated for
the given set of data.
(b) The following issues would arise when the data set is not normalized.
 Accuracy level in the determination of distance would be decreased. This is because
maximum weight would be taken by the high scale measures.
 The measurement would also be affected by the scale, when there are some variables which
show significantly high magnitude (Shumueli, et. al., 2016).
(c) Comparison of the cluster centroid of the different clusters are highlighted below:
5 | P a g e

d) The K Means clustering requisite output is listed below.
It is noted that cluster formation is different for the two. Comparing cluster 1, it is apparent that
the flight transaction in the last year has exceeded 15. Also, the other parameters are quite high.
Hence, these represent the high networth flyers which in hierarchical clustering were indicated
by cluster 1. Hence, the output from the two in terms of cluster labeling is different (Ragsdale,
2014).
e) The cluster targeting and relevant offers are indicated below (Abramowics, 2013).
6 | P a g e

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Reference
Abramowics, W. (2013) Business Information Systems Workshops: BIS 2013 International
Workshops (5th ed.). New York: Springer.
Ana, A. (2014) Integration of Data Mining in Business Intelligence System (4th ed.). Sydney:
IGA Global
Ragsdale, C. (2014) Spread sheet Modelling and Decision Analysis: A Practical Introduction to
Business Analytics (7th ed.). London: Cengage Learning.
Shumueli, G., Bruce, C.P., Yahav, I., Patel, R. N., Kenneth, C., & Lichtendahl, J. (2016) Data
Mining For Business Analytics: Concepts Techniques and Application (2nd ed.).London:
John Wiley & Sons.
7 | P a g e