WEKA Data Mining Project: Hepatitis Prediction and Classification

Verified

Added on 2023/05/30

AI Summary

This project focuses on predicting Hepatitis using the Weka data mining tool. It employs classification and prediction models, specifically using a decision tree approach with the J48 algorithm to build a classification model based on the provided Hepatitis Weka data. The report details the selection of suitable building algorithms, data splitting methods, and output results, including accuracy rates and a visual tree diagram. It describes the classification model, explores the impact of changing the confidence factor, setting the REF parameter to TRUE, and setting the unpruned parameter to TRUE. The project also compares the model's performance against Bayesian and Naïve Bayes networks. Finally, it presents a confusion matrix, ROC curve, and lift chart for the models, and generates a set of rules along the sub-tree path using the JRip algorithm.

1
Data mining

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Table of Contents
1 Introduction.......................................................................................................................2
2 Task - 1...............................................................................................................................2
2.1 Suitable Tree Building algorithm.............................................................................2
2.2 Splitting Method........................................................................................................2
2.3 Output Results...........................................................................................................2
2.4 Accuracy Rates..........................................................................................................6
2.5 Visual Tree diagram..................................................................................................6
3 Task - 2...............................................................................................................................7
3.1 Description of the Classification Model...................................................................7
4 Task - 3...............................................................................................................................7
4.1 Confidence Factor to 30%........................................................................................7
5 Task - 4.............................................................................................................................10
5.1 Set the REF Parameter to TRUE...........................................................................10
6 Task - 5.............................................................................................................................11
6.1 Set the parameter unpruned to TRUE..................................................................11
7 Task - 6.............................................................................................................................13
7.1 Models comparative Ability to other two models.................................................13
7.2 Bayesian Network....................................................................................................13
7.3 Naïve Bayes Network..............................................................................................16
8 Task - 7.............................................................................................................................20
8.1 Confusion Matrix.....................................................................................................20
8.2 ROC Curve...............................................................................................................21
8.3 Lift Chart.................................................................................................................23
9 Task - 8.............................................................................................................................25
9.1 Generate the set of rules along the sub tree path.................................................25
10 Conclusion....................................................................................................................30
References...............................................................................................................................32
1

1 Introduction
This project is used to predict the patient information, where the patients affected by the
Hepatitis disease are identified, by using the Weka data mining tool. The prediction is done
by using the classification and predication models. This project applies the decision tree, to
build a classification model based on the provided data set, which is Hepatitis Weka data. So,
first the user need to select the suitable building algorithms, for building the model and for
splitting the provided data, for the purpose of testing and training. This report will provide a
brief technical description on the classification model. Utilize the tree induction method.
Later, change the confidence factor to 30%, set the Parameter unpruned to TRUE and set the
reduced error pruning to True. Also, report and explain the above change in the model
accuracy. Then, show the confusion matrix, ROC curve, lift chart for the models. Finally,
generate the set of rules.
2 Task - 1
2.1 Suitable Tree Building algorithm
Here, the user selected J48 decision tree algorithm, to predict the patient information,
and the user successfully built the model. It is illustrated in this below section ("APPLYING
DATA MINING TECHNIQUES ON ACADAMIC INSTITUTIONAL SYSTEM USING
WEKA", 2018).
2.2 Splitting Method
 You can assess a classifier by splitting the provided dataset arbitrarily into
testing and training set.
 The train set, it on the previous and the test set, it on the last mentioned.
Obviously, different splits produce slightly different results.
 In the event that, you assess the classifier a few times you can average the
outcomes and compute the standard deviation (Mitchell, 2017).
2.3 Output Results
The user successfully build the model by using the J48 decision tree algorithm. It is
illustrated below.
 First, the user opens the Weka tool.
 Next, click on the Explorer, to upload the provided Hepatitis Weka data.
 Then, click on the classify tab, to select the Trees and choose J48. It is demonstrated
below.
2

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Full Output
=== Classifier model (full training set) ===
4

2.4 Accuracy Rates
Accuracy Rate for build model is illustrated below (Belloc, 2010).
2.5 Visual Tree diagram
Visual tree diagram for build model is illustrated below (Han, Kamber & Pei, 2012).
6

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

3 Task - 2
3.1 Description of the Classification Model
In this project, a classification tree model is built for predicting the patient information
about the patients affected by the Hepatitis disease. The real parameters of the J48 display
shows up not high, particularly in the analysis with the recently thought about techniques.
Regardless, the principle quality of individual classification trees 17 stems are not from high
factual centrality of models, but rather from their interpretation ability. With the end goal to
represent the classification tree in the content mode, look over the content field in the
Classifier yield outline up. Here, we are using the tree induction method, which uses the
decision tree induction method and also utilized the tree induction method (Kapoor, Madan &
Dave, 2017).
4 Task - 3
4.1 Confidence Factor to 30%
Here, the user changes the confidence Factor to 30%, and it does not change the model
accuracy. It is illustrated below.
7

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

5 Task - 4
5.1 Set the REF Parameter to TRUE
To set the reduced error pruning is true, it utilizes the fast pruning algorithm for
increasing the detection rate’s accuracy, in terms of noisy training data, and it is improving
the accuracy of decision tree algorithm. It is illustrated below (Maimon & Rokach, 2010).
10

6 Task - 5
6.1 Set the parameter unpruned to TRUE
If the used set, unpruned is true, it cannot change the reduced error pruning anymore.
So, it also improves the accuracy rate of decision tree algorithm. It is illustrated below (Mitsa,
2010).
11