Data Mining & Visualization for Business Intelligence - Assignment 3

Verified

Added on 2020/03/16

AI Summary

This assignment solution focuses on data mining and visualization techniques for business intelligence. It explores association rules, providing interpretations of the first three rules, discussing the criteria for evaluating rule efficiency, and analyzing the impact of confidence level settings. The solution then delves into cluster analysis, detailing the pre-defined number of clusters, the effects of data normalization, and the use of K-means clustering with five centroids. It highlights the convergence achieved through iterations and the similarity of results from hierarchical and K-means methods. Finally, the solution suggests strategies for making targeted offers to customers based on cluster analysis, emphasizing the importance of understanding customer personas and tailoring rewards to maximize customer retention and revenue generation. References to relevant research papers are also included.

Data Mining and Visualization for Business Intelligence
Assignment - 3
[Pick the date]
Student Name
Contents
1.1 Association Rules........................................................................................................................2
1.1.1 I)...........................................................................................................................................2
1.1.2 II)..........................................................................................................................................3
1.1.3 III).........................................................................................................................................4
1.2 Cluster Analysis............................................................................................................................4
1.2.1 A).........................................................................................................................................4

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

1.2.2 B)..........................................................................................................................................4
1.2.3 C)..........................................................................................................................................4
1.2.4 D).........................................................................................................................................5
1.2.5 e)..........................................................................................................................................5
1.1 Association Rules
1.1.1 I)
Interpretation of the first 3 rules:
Rule1:
It means when the customer buys Brushes & Concealer known as antecedent, they
also buys Bronzer & Nail Polish. This is observed many times which gives the
incident 80% confidence. Support here for event A is 77 while event C has support in
103 instances. The number of times both the events happened were 62, giving the lift
ratio of 3.90. (Gupta, Garg, & Sharma, 2014; Rajak & Gupta, 2008; Sujatha & CH,
2011).
Rule2:
It is opposite to the rule 1 and should be interpreted as, when the customer buys Nail
Polish & Bronzer they also tends to buy Brushes & Concealer. Support for event A is
103 while for event C is 77 & the intersection of the event is 62. But the confidence is
for the rule is as low as 50%.
Rule3:
If any customer purchase nail polish, concealer & bronzer together then they also buy
brushes & it has confidence of 81%.

1.1.2 II)
To better understand the efficiency of the rules generated from the algorithm, there are various
criteria. Firstly we need to examine the level of confidence which gives shows the confidence for
the rules. Also, it should be logical & backed by the business understanding. For example, the
Rule 6 has Confidence level more than 80% & the lift ratio of 3.7. Also, this rules make logical
sense. Hence, this rule can be considered as efficient rule to apply.
1.1.3 III)
When the confidence level is set at 75% then the no. of association rules will reduce.
This is because the algorithm will only choose those rules in which confidence level
is more than or equal to 75%. Confidence Level is calculated by taking the proportion
of the support for A&C to support for A only. Hence, more transaction with the
intersection between antecedent & consequent is required to qualify as rules.

1.2 Cluster Analysis
1.2.1 A)
There are total 5 clusters which we have specified in advance before running the
algorithm in software. This will help the algorithm to reach the convergence level.
1.2.2 B)
When the data is not normalized then the scale of the variable will affect the
distance calculated hence it dominate the measure.
1.2.3 C)
0 5 10 15 20 25 30 35
0
500
1000
1500
2000
2500
0
5
10
15
20
25
30
35
13
16
2
17
10
1415
18
5
2019
3
12
21
7 8 9
4
26
22
1
23
6
11
2425
30
272928
Dendrogram
There are 5 clusters marked with different colors.
D)
We had run the K-means clustering with total five centroids. With
number of iterations set at 50, the cluster algorithm find the convergence
level (found the optimal clusters based on the distance). The optimal
clusters given by hierarchical & k-means are same. So, the results
obtained are same from both the techniques.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

1.2.4 e)
Before making offers to the customers based on each of the clusters,
each and every cluster must be examined to understand the people
personas that falls into the clusters. Based on the understanding, each
clusters should be validated. For eg, the customer having higher balance
should typically be in one cluster. This is observed from the results
obtained, they all comes in cluster 4. So, people in cluster 4 can be
offered more reward points to retain them as they generate higher
revenues to the business. Sending them the gifts, preferred seat
selection can help them to show as appreciation for the customer’s
loyalty. People who generally don’t do transaction are clustered in
cluster 1. So, the offers such as discount, reward points that is specific
to this segment so that the business can extract from the segment(Correa,
González, Nieto, & Amezquita, 2012; Iaci & Singh, 2012; Trebuna,
Halcinova, & Fil’o, 2014).

References
Correa, A., González, A., Nieto, C., & Amezquita, D. (2012). Constructing a Credit Risk
Scorecard using Predictive Clusters. SAS Global Forum.
Gupta, A. K., Garg, R. R., & Sharma, V. K. (2014). Association Rule Mining Techniques
between Set of Items. International Journal of Intelligent Computing and Informatics, 1(1).
Iaci, R., & Singh, A. K. (2012). Clustering high dimensional sparse casino player tracking
datasets. UNLV Gaming Research & Review Journa, 16(1), 21–43.
Rajak, A., & Gupta, M. (2008). Association Rule Mining: Applications in Various Areas. In
International Conference on Data Management,. International Conference on Data
Management,.