Comprehensive Report on Enterprise Business Intelligence Techniques

Verified

Added on 2020/02/18

AI Summary

This report provides a detailed analysis of enterprise business intelligence, focusing on manual knowledge discovery from datasets using the Apriori algorithm and decision tree induction. It explores the generation of association rules, the comparison of Apriori and FP-Growth algorithms, and the application of decision trees in various domains like business management, CRM, and engineering. The report includes pseudocode for the Apriori algorithm and illustrates the construction of an FP-Growth tree. Additionally, it presents a manually created decision tree for traffic accident data, along with entropy and information gain calculations. The report also includes a brief discussion on the limitations and advantages of the decision tree algorithms.

ENTERPRISE BUSINESS INTELLIGENCE

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Table of Contents
Task – 1 Manual Knowledge Discovery from the datasets.............................................................2
1. Apriori algorithm..................................................................................................................2
2. Decision tree Induction Algorithm.......................................................................................3
3. Apriori Algorithm and FP Growth Algorithm......................................................................6
4. Decision Tree......................................................................................................................17
Task 2 – Knowledge Discovery from the given datasets..............................................................21
1. Construction of Association Rules.....................................................................................21
2. J48 Decision Tree for the Car-dataset................................................................................27
References......................................................................................................................................32
1

Task – 1 Manual Knowledge Discovery from the datasets
1. Apriori algorithm
Generation of Association rule is divided into two steps:
1. Initially, less support is used to discover overall frequent item sets in a database.
2. Second, these item sets and the limitation with less confidence helps to make rules.
Discovering entire frequent item sets is very complex since it invokes searching the possible item
combination (Li, 2010). The second one is direct method and first step requires more
consideration. The combination of desirable item sets is the set having power over I and size 2n-
1. Despite, the powerset size exponentially increase in items number as n in I (Tank, 2012). To
use the property of downward-closure effective search is desirable which is also known as anti-
monotonicity (Motoda et al., n.d.). Generally, to count the item sets of candidates this algorithm
uses BFS (breadth-first search) and also uses a tree model to count candidate item sets efficiently
(Schuyler, 2001).
Apriori principle
 If an item combination is periodic, then its entire subset should also be frequent.
 Otherwise, if an item combination is infrequent then its entire superset also be
infrequent.
Apriori principle grasp due to following support calculation
-Where, item set support never be surpass of its subsets support
-This is also called as support property of anti-monotone
There are two rules associated with Candidate rule which is executed by combining two rules
that contributed the same prefix in the rule consequent,
 One is join(CD=>AB, BD=>AC) which will generate the candidate rule called D =>
ABC
 Next rule is prune rule as D=>ABC if its subset as AD=>BC which does not have more
confidence.
2

Comparison of Apriori and FP-Growth algorithm
Apriori
 Techniques
It produces techniques called singletons, triples, pairs etc.
 Runtime
In Apriori, generation of candidate rule is dead slow and runtime grows exponentially
based on the number of various items
 Memory Usage
Apriori saves pairs, singletons, triples etc.
 Parallelizability
Execution of candidate is extremely parallelizable.
FP-Growth
 Techniques
To add arranged items by frequency into a pattern tree
 Run time
In fp-growth algorithm, Runtime grows linearly which is based on the quantity of items
and transactions.
 Memory Usage
It collects a database with compact version.
 Parallelizability
In fp-growth algorithm, data are extremely inter-dependent where each and every node
requires the root node.
2. Decision tree Induction Algorithm
A decision tree is one of the tree structure algorithm in which each node of branch refers a
selection between some amounts of alternatives (De Ville and Neville, 2013). Then, each leaf
node of the tree shows a decision or classification (Blobel, Hasman and Zvárová, 2013) .
3

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

 The algorithm functions over a group of training data set represent as C.
 If all objects in C are available in class P, construct a node called P and terminate,
otherwise choose an attribute named as F and make a node called decision.
 Divide the training objects in C into subsets depending on the values of V.
 Implement the algorithm recurrently into each subset named C.
Converting From trees to rules
 Simple way: each leaf having one rule
 „ C4.5rules: effective prune conditions from each and every rule if this decreases
its calculated mistakes
 It can generate duplicate rules
 Review for this at last.
 Then
 monitor each and every class in turn
 examine the principle for that class
 discover a “good” subset which is handled by MDL)
 To neglect conflicts, list the subsets
 At last, eliminate rules if this reduces failure on the training data set.
C4.5rules:
 It is extremely slow for datasets which is very big and noisy.
 Commercial version of C5.0rules utilizes the various technique
 More quick and little more accurate
 C4.5 has two limits
 One is confidence value by default 25% where secondary values acquire
excessive trim
4

 Very less number of objects in the two most familiar branches.
Classification rules
General procedure called divide-and-conquer (Goetz, 2011).
 Differences:
 Search techniques for example greedy, beam search methods.
 Criteria of Test selection criteria e.g. accuracy,
 Pruning or trimming method e.g. MDL, hold-out set)
 Terminating criterion for e.g. minimum accuracy
 Post-processing level
 And Decision tree list vs. one rule group for each class.
Advantages
Decision trees offer advantages for examine alternatives,
 Graphic. We can perform possible outcomes, decision alternatives, and chance events
schematically. The diagrammatic approach is specifically useful in sequential decisions
as well as dependencies of output.
 Efficient. We can instantly expose difficult alternatives surely. We can easily edit a
decision tree with new data become usable.
 Numerical and nominal attributes are handled by decision tree
 Representation of decision tree is sufficient to produce classifier called discrete-value
classifier.
 Decision trees are ability of managing large datasets with errors.
 Decision trees are ability of managing datasets that may consist of missing values.
 Decision trees are referred as a method called nonparametric method. This represents that
decision trees have no premise of the area allocation and the categorize structure.
Problems in decision tree Algorithm
 Most of the algorithm such as ID3 and C4.5 needs that the destination only have distinct
values
5

 The divide and conquer technique in decision tree tend to process well if some extremely
related attribute exist.
 Decision tree Greedy feature leads to another drawback that must be mention. Because
this is excitability to the training set to noise and inappropriate attribute.
Decision Tree Applications
Business Management
Previously, many companies developed their private databases to improve their customer
services (Bramer, 2017). Therefore, decision tree are an appropriate way to refine useful data
from databases. In specific, decision tree techniques is broadly used in CRM (Customer
Relationship Management) and fraud detection (GRABCZEWSKI, 2016).
Customer Relationship Management
A widely used techniques to maintain customer’s relationship is to examine how
individual can connect with online services (Neves-Silva, Jain and Howlett, 2015). Therefore, an
inspection is mainly achieved by storing and resolve individual’s usage information. Then
promoting suggestion depend on the refined information (Kotu and Deshpande, n.d.).
Engineering
The useful application domain of decision trees is engineering (Koh and Rountree, 2010).
In specifically, decision trees are broadly used in consumption of energy and fault detection.
3. Apriori Algorithm and FP Growth Algorithm
Pseudocode for Apriori Algorithm
Procedure to find the association rules using CustomerTrans Data
6

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Step 1
Item Support
A 5
B 4
C 4
D 1
E 3
F 2
G 5
H 2
A, B, C, D, E, F, G and H denotes each item in a supermarket. Support is the value that the
presence of the item in a transaction. Let us consider minimum support is 3 and the confidence is
70%.
Step 2
Item Support
{A,B} 2
{A,C} 3
{A,D} 0
{A,E} 3
{A,F} 1
{A,G} 4
{A,H} 1
{B,C} 2
{B,D} 1
{B,E} 0
7

{B,F} 2
{B,G} 4
{B,H} 2
{C,D} 1
{C,E} 2
{C,F} 1
{C,G} 3
{C,H} 0
{D,E} 0
{D,F} 1
{D,G} 1
{D,H} 0
{E,F} 0
{E,G} 2
{E,H} 0
{F,G} 2
{F,H} 1
{G,H} 2
Now in the above table frequent item pair and support is specified. The values of the
support which are below the minimum support value is in bold and it need not be considered for
next step. The algorithm need not be repeated and it ends in the second step as there are no 3-
triple that has the minimum support value as 3.
Association Rules
The customers who bought the item A had also bought the item C.
8

The customers who bought the item A has also bought the item E.
The customers who bought the item A has also bought the item G.
The customers who purchased the item B has also bought the item G.
The customers who purchased the item C has also bought the item G.
Strong Association Rule
The item A in association with G is purchased 4 out of 7 times.
The item B in association with G is also purchased 4 out of 7 times.
FP Growth Algorithm
FP Tree Construction
Transaction ID Item
1 A,B,F,G,H
2 B,C,D,F,G
3 A,E,C
4 A,C,E,G
5 A,E,G
6 A,B,C,G
7 B,G,H
After reading the transactions, the FP growth tree is displayed as follows. The FP growth tree is
used to find the frequent itemset.
9

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Step 1
After reading the Transaction ID (TID) 1, the tree is as follows
10
null
H:1
F:1
G:1
B:1
A:5

Step 2
After completing the TID 1, TID 2 is considered and after the completion of TID 2 the tree is
represented as below.
11
null
H:1
G:1
F:1
B:1
A:1
D:1
G:1
F:1
B:1
C:1

1 out of 32

Comprehensive Report on Enterprise Business Intelligence Techniques

Paraphrase This Document

Paraphrase This Document

Paraphrase This Document

Paraphrase This Document

Related Documents

University Project: Apriori Algorithm and Association Rules

ID3 and Apriori Algorithms

+13062052269

info@desklib.com

Comprehensive Report on Enterprise Business Intelligence Techniques

Paraphrase This Document

⊘ This is a preview!⊘

Paraphrase This Document

⊘ This is a preview!⊘

Paraphrase This Document

⊘ This is a preview!⊘

Paraphrase This Document

⊘ This is a preview!⊘

Related Documents

University Project: Apriori Algorithm and Association Rules

ID3 and Apriori Algorithms

+13062052269

info@desklib.com