DATA 25: Comparative Analysis of Classification Models in Data Mining

Verified

Added on  2022/09/28

|25
|1821
|23
Report
AI Summary
This report, created for DATA 25, provides a comparative analysis of five classification algorithms: Multilayer Perceptron, Naive Bayes, J48, Random Forest, and REPtree. The algorithms were tested on three datasets: Iris, Breast Cancer, and Diabetes, using the Weka data mining tool. The report includes descriptive statistics, graphical representations, and performance evaluations based on 10-fold cross-validation, focusing on classification accuracy and confusion matrices. Task 2 explores data mining applications, discussing its role in business decision-making, customer relationships, and product development, including a case study on the development of electric cars. The report also outlines the stages of data mining, from data source to deployment, and highlights key parameters like classification, sequence analysis, clustering, and forecasting. References to relevant literature are also included.
Document Page
Running Head: DATA 1
Data Mining
Name of Student
Date
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
DATA 2
Classification Performance Evaluation
In this part, there will be a comparison of the performance of different classification models on
three datasets. What this means is that all the five models are run on one dataset and then we
check on the actual performance all the models on every dataset.
A. Iris Dataset
We start to think of the statistical alignment of the dataset and the graphs and the results of the
descriptive statistics per variable.
figure 1
From figure 1 it is very clear to see the actual instances under consideration are 150 and a total of
5 variables.
figure 2
From figure 2 above it is clear to see that the name of the variable as sepal length and the number
of distinct values sum up to 35 in total. The variable type is numeric and no cell has a missing
entry. The constants that are recorded are the mean, maximum, minimum, standard deviation
Document Page
DATA 3
values (Eldén, 2019). The proceeding figures will give all the descriptive statistics for the
proceeding variable case.
figure 3
figure 4
figure 5
figure 6
Document Page
DATA 4
figure 7
Figure 7 shows all the graphical alignment of each variable both numeric and nominal.
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
DATA 5
figure 8
Figure 9 gives the scatter plot for variables and there are a total of three variables and the clusters
that they develop are in terms of the variables.
Document Page
DATA 6
a. Algorithms Testing
i. Multilayer Perceptron
ii. Naive Bayes
Document Page
DATA 7
iii. J48
iv. Random Forest
v. RERTree
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
DATA 8
vi. Models comparison
From all the screenshots that have been provided, there are the summary statistics that give
correctly classified instances. The more the number of correctly classified instances, the higher
the percentage and the higher the percentage the better the model of classification. Therefore, on
Iris dataset, starting from the first classification models we have; 97%, 96%, 96%, 95% and 94%.
Therefore, the best model as per this will be the Multilayer Perceptron and REP tree is a poorer
performer in all the models. This can be seen as well on the confusion matrix as there are fewer
instances that are incorrectly classified (Jabez, Gowri, Vigneshwari, Mayan & Srinivasulu,
2019).
Document Page
DATA 9
B. Breast Cancer
In this case, there will be a look into the descriptive statistics in a similar way as it was done in
the Iris dataset above. Each figure gives the statistics as per variable under the variable name.
There is a total of 10 variables
figure 1
figure 2
figure 3
Document Page
DATA 10
figure 4
figure 5
figure 6
figure 7
figure 8
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
DATA 11
Figure 9
figure 10
figure 11
Document Page
DATA 12
b. Algorithms Testing
i. Multilayer Perceptron
ii. Naïve Bayes
chevron_up_icon
1 out of 25
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]