logo

Data Mining for Predicting Patient Details Affected by Hepatitis Disease

   

Added on  2023-05-27

33 Pages2709 Words467 Views
1
Data Mining

Table of Contents
1 Introduction.......................................................................................................................2
2 Task - 1...............................................................................................................................2
2.1 Suitable Tree Building algorithm.............................................................................2
2.2 Splitting Method........................................................................................................2
2.3 Output Results...........................................................................................................3
2.4 Accuracy Rates..........................................................................................................7
2.5 Visual Tree diagram..................................................................................................7
3 Task - 2...............................................................................................................................7
3.1 Description of the Classification Model...................................................................7
4 Task - 3...............................................................................................................................8
4.1 Confidence Factor to 30%........................................................................................8
5 Task - 4.............................................................................................................................11
5.1 Set the REF Parameter to TRUE...........................................................................11
6 Task - 5.............................................................................................................................13
6.1 Set the parameter unpruned to TRUE..................................................................13
7 Task - 6.............................................................................................................................15
7.1 Models comparative Ability to other two models.................................................15
7.2 Bayesian Network....................................................................................................16
7.3 Naïve Bayes Network..............................................................................................19
8 Task - 7.............................................................................................................................20
8.1 Confusion Matrix.....................................................................................................20
8.2 ROC Curve...............................................................................................................21
8.3 Lift Chart.................................................................................................................23
9 Task - 8.............................................................................................................................25
9.1 Generate the set of rules along the sub tree path.................................................25
10 Conclusion....................................................................................................................31
References...............................................................................................................................32
1

1 Introduction
The main objective of this project is to predict the patient details, which are affected by
the Hepatitis disease. The data prediction is completed by using the weka data mining tool.
Basically, data prediction uses two models such as, predication and classification models.
The classification model has various methods to predict the required information. Here, we
are using the decision tree to build the classification models for the Hepatitis disease related
data.
First, we are choosing the appropriate building algorithms to build a model and split the
Hepatitis disease data for training and testing purposes. Once an appropriate building
algorithms are selected, then later the user needs to provide the detailed technical description
of the classification model. Next, it is possible to utilize the tree induction method for the
classification models. Then, change the confidence factor to 30%, set the Parameter unpruned
to TRUE and set the reduced error pruning to True. Also, it is required to explain and report
on, building the classification models and their changes in the model accuracy. Additionally,
show the ROC curve, lift chart and confusion matrix for the models. Finally, generate the set
of rules.
2 Task - 1
2.1 Suitable Tree Building algorithm
To predict the patient information in the data of Hepatitis disease, which is selected
by the J48 decision tree algorithm. The J48 classification is used to provide an effective
prediction for the provided data (Arabnia, Stahlbock, Abou-Nasr & Weiss, n.d.).
2.2 Splitting Method
To evaluate a classifier, use the spitting method. This method is used for splitting the
provided data randomly into testing and training parts.
It is possible to access a classifier by splitting the provided data set randomly
into the testing and training set.
If we evaluate the classifier many time, it slightly changes the results and
calculates the standard deviation.
The train set and test set changes the results by different splits and it produces
slightly different results.
2

2.3 Output Results
The classification model is built by using the J48 decision tree algorithm by following
the below steps (Belloc, 1967).
The user opens the Weka tool.
Click on the Explorer to upload the provided Hepatitis Weka data.
Then, click on the classify tab to select the Trees, and choose J48.
These steps are illustrated below.
3

4

5

6

2.4 Accuracy Rates
The accuracy rate for provided data is shown below (Han, Kamber & Pei, 2012).
TP
Rate
FP
Rate
Precision Recall F-
Measure
MCC ROC
Area
PRC
Area
Class
0.953 0.209 0.853 0.953 0.900 0.763 0.910 0.889 No
0.791 0.047 0.930 0.791 0.855 0.763 0.910 0.890 Yes
Weighted
Avg.
0.882 0.138 0.887 0.882 0.880 0.763 0.910 0.889
2.5 Visual Tree diagram
The Hepatitis disease data visual diagram is illustrated below.
3 Task - 2
3.1 Description of the Classification Model
Classification is the way toward building a model of the classes from large records that
contain the class results. The Decision Tree Algorithm is used to discover the way that the
attributes vector carries on for various occurrences. Likewise, on the bases of the training
cases, the classes for the recently created cases are being found. This algorithm produces the
7

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Data Mining for Predicting Patient Information with Weka Tool
|33
|2205
|219

Comparative Exploration of KNN, J48 and Lazy IBK Classifiers in Weka
|19
|2887
|140

FIT 3152: Data Analytics Assignment
|29
|3405
|427

(PDF) SVM Classification with Linear and RBF kernels
|5
|1826
|79

Data Mining and Visualization for Business Intelligence
|14
|1554
|444

Assignment on Intelligent Systems for Analytics
|47
|6004
|28