logo

Weka: Data Mining and Machine Learning Topic 2022

   

Added on  2022-09-28

13 Pages1214 Words16 Views
Running Head: WEKA 1
TOPIC
NAME OF STUDENT
DATE
Weka: Data Mining and Machine Learning Topic 2022_1
WEKA 2
INTRODUCTION
The number of models that are supposed to be tested on three different datasets
is five in number and they are Multilayer Perceptron, Naïve Bayes, J48, Random
Forest and REPTree. The models are to be run using the default settings that
have been set there and through 10 folds. The data sets for which performance
are to be tested are Breast-cancer, Diabetes and Breast datasets. In the
evaluation of the performance of the five algorithms on the three datasets, there
will be key areas that will be considered in the analysis results interpretation
report. The areas to be focussed on will be; correctly classified instances (with
the percentage), incorrectly classified instances, kappa statistics, true and false
positive rate, precision, recall, F-measure and ROC (Receiver Operator
Characteristic) area. As we start, it is important that we cannot just rely on the
performance percentages of the model as an evaluation criterion. The reason for
this is because we can have a high percentage of correctly classified instances
but yet the model classifies the instances in only one class and leave the other
classes without any classification. Therefore, there must also be a look into the
confusion matrix and the rest. Just to give a brief meaning of what will be looked
into, we first start with the Kappa statistics and this gives the percentage rate of
being right when a random variable is picked and is supposed to be classified.
What shows that we would be right most of the time is how far the kappa value
is far from 0, and the closer the value is to 1 the more right we are supposed to
be as we tend to do classification. The true positive rate (TP Rate) shows the
ratio of the correctly classified instances and the closer the value is to 1 the
better the classification. The false-positive rate (FP Rate) should be closer to zero
(0) as this entirely gives the false classifications that were classified as true and
so the lower their probabilistic classification value the lower the false
classification and the better the model (Jabez et al. 2019). F-measure is an
Weka: Data Mining and Machine Learning Topic 2022_2
WEKA 3
average of precision and recall, values that are gotten from the confusion matrix
that is developed from every classification algorithm developed. The ROC is the
area under the curve and the curve, in this case, is the classification model curve
that the classification model develops. The greater the ROC area value is, and
the closer it is to 1 the higher the variability of the instances and this translates
to.
DATA MININNG AND MACHINE LEARNING
a. Breast-cancer
Weka: Data Mining and Machine Learning Topic 2022_3
WEKA 4
Weka: Data Mining and Machine Learning Topic 2022_4

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Classification Performance Evaluation Tasks 2022
|34
|3548
|50

Data Mining in Weka
|27
|6973
|174

Data Mining Case Study 2022
|25
|1821
|23

Study on Detection of Breast Cancer
|4
|665
|193

Assignment on Statistics in R. Goals and Application
|13
|1059
|18

FIT 3152: Data Analytics Assignment
|29
|3405
|427