Building Decision Trees: An ID3 Algorithm Approach to Recidivism

Verified

Added on  2023/06/07

|5
|550
|256
Report
AI Summary
This report details the construction of a decision tree using the ID3 algorithm to predict recidivism. The process begins by identifying the most informative feature for splitting the dataset at the root node, which in this case is determined to be 'Age < 30' due to its highest information gain. The dataset is then partitioned based on this feature, creating branches for true and false conditions. The left branch, where 'Recidivist' is true, requires no further splitting. The right branch, however, necessitates further partitioning based on the 'Drug Dependent' feature, which yields a higher information gain compared to 'Good Behavior'. The final decision tree, derived from this iterative process, effectively classifies recidivism based on the specified features. Desklib provides this and other solved assignments to aid students in their studies.
Document Page
The initial phase in building the decision tree is to make sense of which of the
three engaging highlights is the best one on which to part the informational index at
the root hub (i.e., which graphic component has the most noteworthy data gain). The
aggregate entropy for this dataset is figured as takes below:
H (RECIDIVIST,D)
= -((3/6*LOG2(3/6))+(3/6*LOG2(3/6))) = 1.00BIT
The below Table Clearly shows the computation of information gain;
Split by
Feature
Level Part Instances Partition
Entropy
Rem Info Gain
Good
Behavior
True
False
D1
D1
d4 d5 d6
D1 d2 d3
0.9183
0.9183
0.9183 0.0817
AGE<30 True
False
D3
D4
D1 d3
D2 d4 d5 d6
0
0.8113
0.5409 0.4591
Drug
Dependent
True
False
D5
D6
d5
d1 d3
D2 d4 d6
0
0.9709
0.8091 0.1909
The age <30 contains the most data gain contrasted with the three highlights. In
the long run, this element will be utilized at the root hub of the tree (Ahuha, 2017).
The figure underneath delineates the condition of the tree after we have made the root
hub and split the information in view of AGE < 30.
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
In this picture we have demonstrated how the information moves down the tree in
light of the split on the AGE < 30 highlight. Note that this element never again shows
up in these datasets on the grounds that we can't part on it again.
The dataset on the left branch contains just occasions where RECIDIVIST is valid
thus does not should be part any further.
The dataset on the correct branch of the tree (D4) isn't homogenous, so we have to
develop this branch of the tree. The entropy for this dataset, D4, is figured as takes
shown:
Calculation of the data gain for the GOOD BEHAVIOR and DRUG
DEPENDENT highlights with regards to the D4 informational collection is appeared
in the beneath table:
Document Page
Split by
Feature
Level Part Instances Partition
Entropy
Rem Info Gain
Good
Behavior
True
false
D7
D8
D4, d5, d6, d2
Drug
Dependent
True
False
D9
D10
d5,
D2, d4, d6
0
0
0 0.8113
These figurings demonstrate that the DRUG DEPENDENT element has a higher
data gain than GOOD BEHAVIOR: 0.8113 versus 0.3522 thus ought to be decided
for the following split (Pinedo, 2016). The picture beneath demonstrates the condition
of the choice tree after the D4 parcel has been part in light of the component DRUG
DEPENDENT.
Document Page
All the datasets at the leaf hubs are presently unadulterated, so the calculation
will quit becoming the tree.Finally,the picture underneath demonstrates the tree that
will be returned by the ID3 algorithm:
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
B)
RECIDIVIST = true
C)
RECIDIVIST = true
References
Ahuja, R. K. (2017). Network flows: theory, algorithms, and applications. Pearson
Education.
Pinedo, M. L. (2016). Scheduling: theory, algorithms, and systems. Springer.
chevron_up_icon
1 out of 5
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]