Data Mining Techniques and Applications
VerifiedAdded on 2020/03/28
|6
|1230
|48
AI Summary
This assignment delves into the practical application of two key data mining techniques: association rule mining and clustering. Students analyze a provided dataset using these methods to identify interesting relationships between customer attributes and their transaction history. The analysis involves generating association rules that highlight frequently occurring item combinations and employing K-means clustering to segment customers based on their behavior. The assignment emphasizes the importance of understanding both techniques, interpreting the results, and drawing meaningful conclusions for business applications such as targeted marketing and customer retention.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
Data Mining and Visualization for Business Intelligence
Assignment - 3
[Pick the date]
Student Name
Contents
1.1 Association Rules........................................................................................................................2
1.1.1 I)...........................................................................................................................................2
1.1.2 II)..........................................................................................................................................3
1.1.3 III).........................................................................................................................................4
1.2 Cluster Analysis............................................................................................................................4
1.2.1 A).........................................................................................................................................4
Assignment - 3
[Pick the date]
Student Name
Contents
1.1 Association Rules........................................................................................................................2
1.1.1 I)...........................................................................................................................................2
1.1.2 II)..........................................................................................................................................3
1.1.3 III).........................................................................................................................................4
1.2 Cluster Analysis............................................................................................................................4
1.2.1 A).........................................................................................................................................4
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
1.2.2 B)..........................................................................................................................................4
1.2.3 C)..........................................................................................................................................4
1.2.4 D).........................................................................................................................................5
1.2.5 e)..........................................................................................................................................5
Ro
w
ID
Confidence
%
Antecedent
(A)
Consequen
t (C)
Suppor
t for A
Suppor
t for C
Suppor
t for A
& C Lift Ratio
1
80.5194805
2
Brushes &
Concealer
Nail Polish
& Bronzer 77 103 62
3.90871264
7
2
60.1941747
6
Nail Polish &
Bronzer
Brushes &
Concealer 103 77 62
3.90871264
7
3
81.5789473
7
Nail Polish &
Concealer &
Bronzer Brushes 76 110 62
3.70813397
1
1.1 Association Rules
1.1.1 I)
Rule 1 implies that when any customer purchase brushes & concealer together, then
with confidence of 80% they will buy Nail Polish & bronzer. The support here for
event A to happen is 77 which is derived from the no. of transaction that supports for
A, while transaction that support the event C are 103. The event A & C happened
together about 62 times. The lift ratio indicate the chances of purchasing Brushes,
Concealer, and Nail Polish & Bronzer when compared to the entire transaction.
1.2.3 C)..........................................................................................................................................4
1.2.4 D).........................................................................................................................................5
1.2.5 e)..........................................................................................................................................5
Ro
w
ID
Confidence
%
Antecedent
(A)
Consequen
t (C)
Suppor
t for A
Suppor
t for C
Suppor
t for A
& C Lift Ratio
1
80.5194805
2
Brushes &
Concealer
Nail Polish
& Bronzer 77 103 62
3.90871264
7
2
60.1941747
6
Nail Polish &
Bronzer
Brushes &
Concealer 103 77 62
3.90871264
7
3
81.5789473
7
Nail Polish &
Concealer &
Bronzer Brushes 76 110 62
3.70813397
1
1.1 Association Rules
1.1.1 I)
Rule 1 implies that when any customer purchase brushes & concealer together, then
with confidence of 80% they will buy Nail Polish & bronzer. The support here for
event A to happen is 77 which is derived from the no. of transaction that supports for
A, while transaction that support the event C are 103. The event A & C happened
together about 62 times. The lift ratio indicate the chances of purchasing Brushes,
Concealer, and Nail Polish & Bronzer when compared to the entire transaction.
According to Rule 2 when the customer purchase Nail Polish & Bronzer, they also
tend to purchase Brushes & Concealer. This result is supported by the number of
transaction falling into each event. For example, support for event A happening is 103
and same for event C is 77. This rule is in complete opposition of the first rule, which
is the reason for the same lift ratio, though the confidence level for Rule 2 is less.
According to Rule 3 if a customer purchase nail polish, concealer & bronzer together
then they also purchase brushes with 81% confidence level (Gupta, Garg, & Sharma,
2014; Rajak & Gupta, 2008; Sujatha & CH, 2011).
1.1.2 II)
Ro
w
ID
Confidence
% Antecedent (A) Consequent (C)
Suppor
t for A
Suppor
t for C
Suppor
t for A
& C Lift Ratio
1
80.5194805
2
Brushes &
Concealer
Nail Polish &
Bronzer 77 103 62
3.90871264
7
2
60.1941747
6
Nail Polish &
Bronzer
Brushes &
Concealer 103 77 62
3.90871264
7
3
81.5789473
7
Nail Polish &
Concealer &
Bronzer Brushes 76 110 62
3.70813397
1
4
56.3636363
6 Brushes
Nail Polish &
Concealer &
Bronzer 110 76 62
3.70813397
1
5
76.3636363
6 Brushes
Nail Polish &
Bronzer 110 103 84
3.70697263
9
6
81.5533980
6
Nail Polish &
Bronzer Brushes 103 110 84
3.70697263
9
7
73.8095238
1
Brushes &
Bronzer
Nail Polish &
Concealer 84 109 62
3.38575797
3
8
56.8807339
4
Nail Polish &
Concealer
Brushes &
Bronzer 109 84 62
3.38575797
3
9
70.6422018
3
Nail Polish &
Concealer Brushes 109 110 77
3.21100917
4
10 70 Brushes
Nail Polish &
Concealer 110 109 77
3.21100917
4
11
67.0731707
3
Blush & Nail
Polish Brushes 82 110 55
3.04878048
8
12 50 Brushes
Blush & Nail
Polish 110 82 55
3.04878048
8
To understand the efficiency of the rules, there are various criteria. First of all we need to
examine into the confidence level which gives shows the confidence for the rules. Also, it should
tend to purchase Brushes & Concealer. This result is supported by the number of
transaction falling into each event. For example, support for event A happening is 103
and same for event C is 77. This rule is in complete opposition of the first rule, which
is the reason for the same lift ratio, though the confidence level for Rule 2 is less.
According to Rule 3 if a customer purchase nail polish, concealer & bronzer together
then they also purchase brushes with 81% confidence level (Gupta, Garg, & Sharma,
2014; Rajak & Gupta, 2008; Sujatha & CH, 2011).
1.1.2 II)
Ro
w
ID
Confidence
% Antecedent (A) Consequent (C)
Suppor
t for A
Suppor
t for C
Suppor
t for A
& C Lift Ratio
1
80.5194805
2
Brushes &
Concealer
Nail Polish &
Bronzer 77 103 62
3.90871264
7
2
60.1941747
6
Nail Polish &
Bronzer
Brushes &
Concealer 103 77 62
3.90871264
7
3
81.5789473
7
Nail Polish &
Concealer &
Bronzer Brushes 76 110 62
3.70813397
1
4
56.3636363
6 Brushes
Nail Polish &
Concealer &
Bronzer 110 76 62
3.70813397
1
5
76.3636363
6 Brushes
Nail Polish &
Bronzer 110 103 84
3.70697263
9
6
81.5533980
6
Nail Polish &
Bronzer Brushes 103 110 84
3.70697263
9
7
73.8095238
1
Brushes &
Bronzer
Nail Polish &
Concealer 84 109 62
3.38575797
3
8
56.8807339
4
Nail Polish &
Concealer
Brushes &
Bronzer 109 84 62
3.38575797
3
9
70.6422018
3
Nail Polish &
Concealer Brushes 109 110 77
3.21100917
4
10 70 Brushes
Nail Polish &
Concealer 110 109 77
3.21100917
4
11
67.0731707
3
Blush & Nail
Polish Brushes 82 110 55
3.04878048
8
12 50 Brushes
Blush & Nail
Polish 110 82 55
3.04878048
8
To understand the efficiency of the rules, there are various criteria. First of all we need to
examine into the confidence level which gives shows the confidence for the rules. Also, it should
be logical & backed by the business understanding. For example, the Rule 6 has Confidence
level more than 80% & the lift ratio of 3.7. Also, this rules make logical sense. Hence, this rule
can be considered as efficient rule to apply.
1.1.3 III)
When the confidence level is set at 75% then the no. of association rules will reduce.
This is because the algorithm will only choose those rules in which confidence level
is more than or equal to 75%. Confidence Level is calculated by taking the proportion
of the support for A&C to support for A only. Hence, more transaction with the
intersection between antecedent & consequent is required to qualify as rules.
1.2 Cluster Analysis
1.2.1 A)
There are 5 clusters which we had specified in advance for the algorithm in
software. This helps the algorithm to reach the convergence level.
1.2.2 B)
When the data is not being normalized then the scale of the variable will affect
the distance calculated hence dominate the measure.
level more than 80% & the lift ratio of 3.7. Also, this rules make logical sense. Hence, this rule
can be considered as efficient rule to apply.
1.1.3 III)
When the confidence level is set at 75% then the no. of association rules will reduce.
This is because the algorithm will only choose those rules in which confidence level
is more than or equal to 75%. Confidence Level is calculated by taking the proportion
of the support for A&C to support for A only. Hence, more transaction with the
intersection between antecedent & consequent is required to qualify as rules.
1.2 Cluster Analysis
1.2.1 A)
There are 5 clusters which we had specified in advance for the algorithm in
software. This helps the algorithm to reach the convergence level.
1.2.2 B)
When the data is not being normalized then the scale of the variable will affect
the distance calculated hence dominate the measure.
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
1.2.3 C)
0 5 10 15 20 25 30 35
0
500
1000
1500
2000
2500
0
5
10
15
20
25
30
35
13
16
2
17
10
1415
18
5
2019
3
12
21
7 8 9
4
26
22
1
23
6
11
2425
30
272928
Dendrogram
There are 5 clusters marked with different colors.
D)
We have run the K-means clustering with five centroids. With number
of iterations set at 50, the cluster algorithm find the convergence level
(found the optimal clusters based on the distance). Hence, the optimal
clusters given by hierarchical & k-means are same. So, the results
obtained are same from both the techniques.
1.2.4 e)
Before targeting the customers with different offers based on each of the
clusters, each cluster must be examined to understand the people
personas that falls into the clusters. Based on those understanding, each
clusters should be validated. For eg, the customer with higher balance
should typically be in one cluster. This is observed from the results
obtained, they all fall in cluster 4. So, people in cluster 4 can be offered
0 5 10 15 20 25 30 35
0
500
1000
1500
2000
2500
0
5
10
15
20
25
30
35
13
16
2
17
10
1415
18
5
2019
3
12
21
7 8 9
4
26
22
1
23
6
11
2425
30
272928
Dendrogram
There are 5 clusters marked with different colors.
D)
We have run the K-means clustering with five centroids. With number
of iterations set at 50, the cluster algorithm find the convergence level
(found the optimal clusters based on the distance). Hence, the optimal
clusters given by hierarchical & k-means are same. So, the results
obtained are same from both the techniques.
1.2.4 e)
Before targeting the customers with different offers based on each of the
clusters, each cluster must be examined to understand the people
personas that falls into the clusters. Based on those understanding, each
clusters should be validated. For eg, the customer with higher balance
should typically be in one cluster. This is observed from the results
obtained, they all fall in cluster 4. So, people in cluster 4 can be offered
more reward points to retain them as they generate higher revenues to
the business. Sending them the gifts, preferred seat selection can help
them to show as appreciation for the customer’s loyalty. People who
generally don’t do transaction are clustered in cluster 1. So, the offers
such as discount, reward points that is specific to this segment so that
the business can extract from the segment(Correa, González, Nieto, &
Amezquita, 2012; Iaci & Singh, 2012; Trebuna, Halcinova, & Fil’o,
2014).
References
Correa, A., González, A., Nieto, C., & Amezquita, D. (2012). Constructing a Credit Risk
Scorecard using Predictive Clusters. SAS Global Forum.
Gupta, A. K., Garg, R. R., & Sharma, V. K. (2014). Association Rule Mining Techniques
between Set of Items. International Journal of Intelligent Computing and Informatics, 1(1).
Iaci, R., & Singh, A. K. (2012). Clustering high dimensional sparse casino player tracking
datasets. UNLV Gaming Research & Review Journa, 16(1), 21–43.
Rajak, A., & Gupta, M. (2008). Association Rule Mining: Applications in Various Areas. In
International Conference on Data Management,. International Conference on Data
Management,.
Sujatha, D., & CH, N. (2011). Quantitative Association Rule Mining on Weighted Transactional
Data. International Journal of Information and Education Technolog, 1(3).
Trebuna, P., Halcinova, J., & Fil’o, M. (2014). The importance of normalization and
standardization in the process of clustering. IEEE, 12, 381.
the business. Sending them the gifts, preferred seat selection can help
them to show as appreciation for the customer’s loyalty. People who
generally don’t do transaction are clustered in cluster 1. So, the offers
such as discount, reward points that is specific to this segment so that
the business can extract from the segment(Correa, González, Nieto, &
Amezquita, 2012; Iaci & Singh, 2012; Trebuna, Halcinova, & Fil’o,
2014).
References
Correa, A., González, A., Nieto, C., & Amezquita, D. (2012). Constructing a Credit Risk
Scorecard using Predictive Clusters. SAS Global Forum.
Gupta, A. K., Garg, R. R., & Sharma, V. K. (2014). Association Rule Mining Techniques
between Set of Items. International Journal of Intelligent Computing and Informatics, 1(1).
Iaci, R., & Singh, A. K. (2012). Clustering high dimensional sparse casino player tracking
datasets. UNLV Gaming Research & Review Journa, 16(1), 21–43.
Rajak, A., & Gupta, M. (2008). Association Rule Mining: Applications in Various Areas. In
International Conference on Data Management,. International Conference on Data
Management,.
Sujatha, D., & CH, N. (2011). Quantitative Association Rule Mining on Weighted Transactional
Data. International Journal of Information and Education Technolog, 1(3).
Trebuna, P., Halcinova, J., & Fil’o, M. (2014). The importance of normalization and
standardization in the process of clustering. IEEE, 12, 381.
1 out of 6
Related Documents
Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
© 2024 | Zucol Services PVT LTD | All rights reserved.