Using Classification Trees: Data Examples, Benefits, and Pruning

Verified

Added on 2023/06/08

AI Summary

This discussion post delves into the application of classification trees, beginning with a description of the data they handle, including hypothetical examples related to vehicle characteristics. It explores the benefits of using classification trees, such as ease of understanding, simple coding, handling non-linear data, and fast prediction, with the aim of implementing a cost-effective and reliable classifier. The post also discusses the selection of significant predictors using the random forest algorithm in R, emphasizing the importance of correlation between predictors. Finally, it explains the concept of tree pruning, highlighting its necessity for reducing overfitting and creating a more generalized and accurate decision tree by removing redundant data and rectifying errors from poor predictor choices. Desklib offers this and many other solved assignments to aid students in their studies.

Running head: CLASSIFICATION TREES 1
Classification Trees
Name of the Student
Name of the Instructor
Course Code
Date

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

CLASSIFICATION TREES 2
1. What is the data like (with some examples)? Consider providing some hypothetical
examples.
Answer 1: High dimensional data is handled by the decision tree. The Data belongs to the
dataset that is processed by the decision tree(Ghosh, Roy & Bandyopadhyay, 2012). The
decision tree is trained in processing the dataset based on the condition through which the
decision tree is trained in. The data set is split using these training condition defined in the
decision tree for making the decision tree(Shaikhina et al., 2017). The decision tree using the
top-down split method for breaking down the dataset. The data in the provided dataset is required
to be homogeneous for the decision tree to process it. The example of such data is provided
below.
mpg cylinders displacement horsepower weight acceleration year origin nam
e
0 good 8 307.0 432 3504 5.6 70 1 Chevrolet
Camero
1 bad 8 350.0 165 3693 11.5 70 1 BMW M6
2 good 8 318.0 150 3436 11.0 70 1 Plymouth
satellite
3 bad 8 304.0 220 2574 8.0 70 1 Lotus Elise
4 ok 8 302.0 140 2914 10.5 70 1 Ford Figo

CLASSIFICATION TREES 3
Figure 1: Decision Tree Example
(Source: Fayyad & Uthurusamy, 2016)
2. What type of benefit do you hope to get from classification trees? What major questions
are you attempting to answer?
Answer 2: The benefit expected from the classification tree is as follows:
1. It can be easy is classified and understood by a human based on the rules that are
used for building the tree (Ghosh, Roy & Bandyopadhyay, 2012).
2. The ease of coding makes the classification much simpler than another
classification method of the dataset.
3. Non-linear data can be classified through the classification tree and hence does
not require an order of the dataset. It thus provides a more dynamic approach.
4. The prediction is quite fast using the classified trees; however, this depends on the
dataset and the number of the alternative present in the tree.
5. The construction of the tree is fast and cheap.
Through the knowledge of the benefits, the attempt is made to implement a fast, cheap
and reliable classifier of the dataset through which the prediction of the data can be made with
minimum time and cost that is understood by everyone as it is represented visually.

CLASSIFICATION TREES 4
3. Describe how you would go about deciding the most significant predictors in your model.
Answer 3: The decision of the predictors is most crucial part of the decision tree and through
these, the decision in the made in classification tree to choose the most optimal outcome. The
random forest algorithm is used for the correlation of the predictors and split the nodes through
the predefined validation mechanisms (Shaikhina et al., 2017). The correlation between the
predictor influences the importance of the permutation. For the implementation of the algorithm,
R language is required. After the installation of the random forest package is added. From the
library function, the random function is called setting the response of the data variable to make
the prediction and the plot the result using the varImpPlot.
The following code is used for the implementation.
rffit<- randomForest(resp ~ v0 + v1 + v2 + v3 + v4 + v5 + v6 + v7 + v8 + v9, data=data,
ntree=2000, keep.forest=FALSE, importance=TRUE)
where resp is your response variable; the v(i) are your variables for prediction.
4. Describe "pruning of a tree". What is it and why is it necessary?
Answer 4: For obtaining a generalized tree the pruning of tree is used. The pruning reduces the
depth of the tree and overfitting of the data in the decision tree (Phan, 2014). The pruning
regulated the tree through the implementation of the filter that removes the unnecessary or the
redundant or excess data in the decision tree and makes the tree have a more generalized
appearance. The reduced error pruning is most commonly used in the decision rectifying the
error caused due to the selection of the poor predictors (Yang et al., 2015). Thus, the defects are
removed from the decision tree and hence pruning is necessary for a decision tree.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

CLASSIFICATION TREES 5
Reference
Fayyad, U. M., & Uthurusamy, R. (2016). Induction of Decision Trees from Inconclusive
Data. Machine Learning Proceedings 1989, 146.
Ghosh, S., Roy, S., & Bandyopadhyay, S. (2012). A tutorial review on Text Mining
Algorithms. International Journal of Advanced Research in Computer and
Communication Engineering, 1(4), 7.
Phan, T. (2014, September). Improving activity recognition via automatic decision tree pruning.
In Proceedings of the 2014 ACM International Joint Conference on Pervasive and
Ubiquitous Computing: Adjunct Publication (pp. 827-832). ACM.
Shaikhina, T., Lowe, D., Daga, S., Briggs, D., Higgins, R., &Khovanova, N. (2017). Decision
tree and random forest models for outcome prediction in antibody incompatible kidney
transplantation. Biomedical Signal Processing and Control.
Yang, L., Chen, J., Hua, J., Kang, M., & Dong, Q. (2015, September). Interactive Pruning
Simulation of Apple Tree. In International Conference on Computer and Computing
Technologies in Agriculture (pp. 604-611). Springer, Cham.

1 out of 5

Using Classification Trees: Data Examples, Benefits, and Pruning

Paraphrase This Document

Paraphrase This Document

Related Documents

WEKA Data Mining Project: Hepatitis Prediction and Classification

Intrusion Detection with Data Analytics: MN623 Assignment Report

+13062052269

info@desklib.com

Using Classification Trees: Data Examples, Benefits, and Pruning

Paraphrase This Document

⊘ This is a preview!⊘

Paraphrase This Document

Related Documents

WEKA Data Mining Project: Hepatitis Prediction and Classification

Intrusion Detection with Data Analytics: MN623 Assignment Report

+13062052269

info@desklib.com