University Project: Apriori Algorithm and Association Rules

Verified

Added on  2022/11/13

|9
|1248
|151
Project
AI Summary
This project delves into the Apriori algorithm, a fundamental concept in data mining. It begins with an introduction to the algorithm, referencing its origins in 1994 by R. Aggarwal and R. Srikant. The discussion section details the algorithm's functionality, particularly its anti-monotonicity property and its use of support measures to find frequent itemsets. The project provides a step-by-step explanation of the Apriori algorithm, including generating candidate sets (C1, C2, C3, and C4), calculating support counts, and pruning infrequent itemsets. It also covers the concept of confidence and how it is used to generate association rules, with examples of rules derived from frequent itemsets. The conclusion summarizes the Apriori algorithm's role in mining frequent patterns in transaction datasets and its application in identifying associations between items. References to related research papers are included to support the analysis.
Document Page
Running head: COMPLEX ALGORITHM
COMPLEX ALGORITHM
Name of the Student
Name of the University
Author Note:
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
1COMPLEX ALGORITHM
Table of Contents
Introduction..........................................................................................................................2
Discussion............................................................................................................................2
Conclusion...........................................................................................................................7
References............................................................................................................................8
Document Page
2COMPLEX ALGORITHM
Introduction
In the year 1994, R. Aggarwal and R. Srikant have come up with Apriori Algorithm
which is needed for finding the frequency of itemsets in the given dataset. It merely comes up
with rule of Boolean association (Bhandari, Gupta & Das, 2015). Name of this Apriori algorithm
is just, as it makes use of earlier knowledge for some of frequency of itemset properties.
Discussion
All the given subset of non-empty needs to be frequent for the item. The main function
of this algorithm is its property that is anti-monotonicity, which provides support measures.
Apriori takes into consideration that all subset is frequent itemset needs to be frequent. If the
item set is infrequent, then all the supersets are found to be infrequent (Rathee, Kaul & Kashyap,
2015). Before the staring of the algorithm, user needs to go to some definition that will explain
my previous post. Take into consideration of dataset, which is needed to find frequent itemsets
along with rules association with them.
TID Items
T1 L1, L2, L5
T2 L2, L4
T3 L2, L3
T4 L1, L2, L4
T5 L1, L3
Document Page
3COMPLEX ALGORITHM
T6 L2, L3
T7 L1, L3
T8 L1, L2, L3, L5
T9 L1, L2, L3
Minimum Support Count stands at 2 where minimum confidence stands at 60%.
Step-1: K=1,
By the help of table for supports the count for each item in current dataset known as C1.
Item Set Sup_Count
L1 6
L2 7
L3 6
L4 2
L5 2
Step -2, K=2
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
4COMPLEX ALGORITHM
Candidate needs to generate set C2 by using L1 (join step). The given condition of
joining Lk-1 and it needs to have around (K-2) element in common.
Checking all the subset of the given item is frequent or not (Khalili & Sami, 2015). If it is
not frequent, then the particular item set needs to be removed.
Now, support count of the itemset by searching it in the dataset.
Itemset Sup_Count
L1, L2 4
L1, L3 4
L1, L4 1
L1, L5 2
L2, L3 4
L2, L4 2
L2, L5 2
L3, L4 0
L3, L5 1
L4, L5 0
Document Page
5COMPLEX ALGORITHM
Compare candidate (C2) aim to support with low support count which gives us the item
set L2.
Step – 3
Generation of set by using L2 (join step). There is a joining condition that is Lk-1 and Lk-1
which need to have (K-2) elements in all total. In this case, L2 elements need to be
matched. Item sets that is generated by joining L2 are {(L1, L2, L3), (L1, L2, L5), (L1,
L2, L5), (L2, L3, L4), (L2, L4, L5), (L2, L3, L5)}.
There is a need for checking if all the subset of elements is frequent or not (Xi et al.,
2016). If not, then there is need of removing that particular item-set.
Finding Support how these remaining items are searched in the given dataset.
Item set Sup_count
L1, L2, L3 2
L1, L2, L5 2
Step – 4
There is a need for generation of set C4 by using L3 (join step). The given condition
of joining that is Lk-1 and Lk-1 (K=4) is that it come up with (K-2) elements in
common. In this case, for L3 the first three elements item need to match.
Check out all the subset for element which is not frequent. It is mainly formed by
joining L3 that is (L1, L2, L3, L5) which has subset (L1, L3, L5) which is not
frequent. So there is item in C4.
Document Page
6COMPLEX ALGORITHM
It is the last stop because there are no frequent itemset is found anymore.
Confidence: A confidence of value greater than 60% means that 60% of customer has
purchased milk and bread also have bought butter (Xi et al., 2016). By considering example of
frequent itemset, consideration should be there for rule generation.
Itemset {I1, I2, I3} //from L3
SO rules can be
[I1^I2]=>[I3] //confidence = sup(I1^I2^I3)/sup(I1^I2) = 2/4*100=50%
[I1^I3]=>[I2] //confidence = sup(I1^I2^I3)/sup(I1^I3) = 2/4*100=50%
[I2^I3]=>[I1] //confidence = sup(I1^I2^I3)/sup(I2^I3) = 2/4*100=50%
[I1]=>[I2^I3] //confidence = sup(I1^I2^I3)/sup(I1) = 2/6*100=33%
[I2]=>[I1^I3] //confidence = sup(I1^I2^I3)/sup(I2) = 2/7*100=28%
[I3]=>[I1^I2] //confidence = sup(I1^I2^I3)/sup(I3) = 2/6*100=33%
If minimum confidence is found to be 50% then first three rule can be considered to be
strong rule of association.
Conclusion
Apriori is used for mining that is repeating pattern for transaction dataset. It is mainly
used for finding frequent items and association between different items. People can easily apply
an iterative approach or even search through level-wise where k-frequent itemsets are being used
for finding item sets of k+1. For improving the overall efficiency in the level-wise generation for
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
7COMPLEX ALGORITHM
frequent itemsets, a vital property is used. Apriori property is very much helpful in reducing the
overall search space.
Document Page
8COMPLEX ALGORITHM
References
Bhandari, A., Gupta, A., & Das, D. (2015). Improvised apriori algorithm using frequent pattern
tree for real time applications in data mining. Procedia Computer Science, 46, 644-651.
Khalili, A., & Sami, A. (2015). SysDetect: a systematic approach to critical state determination
for Industrial Intrusion Detection Systems using Apriori algorithm. Journal of Process
Control, 32, 154-160.
Rathee, S., Kaul, M., & Kashyap, A. (2015, October). R-Apriori: an efficient apriori based
algorithm on spark. In Proceedings of the 8th Workshop on Ph. D. Workshop in
Information and Knowledge Management (pp. 27-34). Acm.
Xi, J., Zhao, Z., Li, W., & Wang, Q. (2016). A traffic accident causation analysis method based
on AHP-apriori. Procedia engineering, 137, 680-687.
chevron_up_icon
1 out of 9
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]