logo

Data Mining: Classification, Numeric Prediction, Clustering and Association Finding

   

Added on  2022-10-12

30 Pages3471 Words478 Views
University
Semester
Data Mining
Student ID
Student Name
Submission Date
1

Table of Contents
Part 1 – Classification 4
1. Task 1 – Classifier 4
2. Task 2 - J48 Classifier 5
3. Task 3 - Reset J48 parameters 6
4. Task 4 - IBK Classifier 7
5. Task 5 - Predictive Accuracy 7
6. Task 6 – Accuracy 8
7. Task 7 - Golden Nuggets 9
8. Task 8 – Attribute Selection Algorithm 9
Part 2 - Numeric Prediction 10
1. Task 1 – Classifiers 10
2. Task 2 - Explore Different Parameters Settings 11
3. Task 3 - Investigation 12
4. Golden Nuggets 12
Part 3 – Clustering 13
1. Task 1 - K Means Clustering 13
2. Task 2 - Effects of Seeds 18
3. Task 3 - EM Algorithm 19
4. Task 4 - Normalize Filter 20
5. Task 5 - Values Changes 22
6. Task 6 – Clusters 22
7. Task 7 - Compare K Means and EM clustering 22
8. Task 8 - Golden Nuggets 22
Part 4 - Association Finding 22
1. Task 1 – Representation 22
2. Task 2 - Apriori Algorithm - Groceries Data set 1 23
3. Task 3 - Explore Different Possibilities 24
4. Task 4 - Apriori Algorithm - Groceries Data set 2 25
5. Task 5 - Explore Different Possibilities 25
6. Task 6 - Other Associators 26
7. Task 7 - Golden Nuggets 26
2

References 27
3

Part 1 – Classification
1. Task 1 – Classifier
The Training and Cross-Validation errors table for Zero R, One R and J48 and IBK
classifiers are illustrated as below.
Xero
Correctly Classified Instances 700 70 %
Incorrectly Classified Instances 300 30 %
Kappa statistic 0
Mean absolute error 0.4202
Root mean squared error 0.4583
Relative absolute error 100 %
Root relative squared error 100 %
Total Number of Instances 1000
One R
Correctly Classified Instances 743 74.3 %
Incorrectly Classified Instances 257 25.7 %
Kappa statistic 0.3009
Mean absolute error 0.257
Root mean squared error 0.507
Relative absolute error 61.1672 %
Root relative squared error 110.6259 %
Total Number of Instances 1000
J48
Correctly Classified Instances 855 85.5 %
4

Incorrectly Classified Instances 145 14.5 %
Kappa statistic 0.6251
Mean absolute error 0.2312
Root mean squared error 0.34
Relative absolute error 55.0377 %
Root relative squared error 74.2015 %
Total Number of Instances 1000
IBK
Correctly Classified Instances 1000 100 %
Incorrectly Classified Instances 0 0 %
Kappa statistic 1
Mean absolute error 0.001
Root mean squared error 0.001
Relative absolute error 0.2375 %
Root relative squared error 0.2178 %
Total Number of Instances 1000
Based on training and cross-validation errors for Xero, One R, J48 and IBK
classifiers, the IBK classifiers has low training and cross-validation errors compared to other
classifiers and it provides the 100% of correctly classified instances. So, it is the best
classifiers compared to other classifiers (Azzalini and Scarpa, 2012).
2. Task 2 - J48 Classifier
They determined the C and M values are presented below.
C Values - 0.25
M Values – 2
5

These values minimize the amount of overfitting. Results are presented below.
=== Summary ===
Correctly Classified Instances 855 85.5 %
Incorrectly Classified Instances 145 14.5 %
Kappa statistic 0.6251
Mean absolute error 0.2312
Root mean squared error 0.34
Relative absolute error 55.0377 %
Root relative squared error 74.2015 %
Total Number of Instances 1000
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.956 0.380 0.854 0.956 0.902 0.640 0.857 0.905 good
0.620 0.044 0.857 0.620 0.720 0.640 0.857 0.783 bad
Weighted Avg. 0.855 0.279 0.855 0.855 0.847 0.640 0.857 0.869
=== Confusion Matrix ===
a b <-- classified as
669 31 | a = good
114 186 | b = bad
3. Task 3 - Reset J48 parameters
Here, we will reset the parameters as,
C Values - 0.25
6

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Classification Performance Evaluation Tasks 2022
|34
|3548
|50

Online Retail Store Data Analysis with Weka: Insights and Patterns in Sales Data
|8
|348
|58