Digit Recognition and Pattern Analysis using Classification Techniques

Added on 2022-10-19

10 Pages2450 Words398 Views

Data analytics and business intelligence
Student Name:
Instructor Name:
Course Number:
24th July 2019

Section 1: Introduction
Digit recognition and pattern analysis is one of the well-recognized topics areas in the imaging
recognition field and they offer a great opportunity towards familiarizing with machine learning
techniques. In this study, three different classification techniques (Naïve Bayes, Support vector
Machine and the K-Nearest Neighbor method). Before performing the classification algorithms,
data was preprocessed and cleaned ready for the building of the models. One of the cleaning
performed for all the three classification algorithms was to remove all the pixels that had a value
of zero in all images. The data was also categorized into training dataset and test dataset. The
training dataset was represented by about 70% while the test dataset was represented by 30%.
The analysis of the three classification models showed that Support vector Machine (SVM)
produces the highest accuracy (83.16%) while K-Nearest Neighbor produces the lowest accuracy
(57.65%).
Section 2: Naïve Bayes Build a naïve Bayes model.
The results of the build model are presented below;
> confusionMatrix(t.test$label, nb_pred)
Confusion Matrix and Statistics
Reference
Prediction 0 1 2 3 4 5 6 7 8 9
0 18 0 1 0 0 0 0 0 1 0
1 0 25 0 0 0 0 0 0 1 0
2 0 0 12 0 0 0 3 0 2 0
3 2 0 0 9 0 1 1 0 1 2
4 0 0 0 0 9 1 1 0 2 13
5 0 1 0 4 1 4 0 1 5 4
6 1 0 0 0 0 0 12 0 1 0
7 0 3 0 0 0 0 0 8 0 9
8 0 1 1 0 0 0 0 0 16 2
9 0 3 0 0 0 0 0 0 0 14
Overall Statistics
Accuracy : 0.648

95% CI : (0.5767, 0.7147)
No Information Rate : 0.2245
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.6087
Mcnemar's Test P-Value : NA
Statistics by Class:
Class: 0 Class: 1 Class: 2
Sensitivity 0.85714 0.7576 0.85714
Specificity 0.98857 0.9939 0.97253
Pos Pred Value 0.90000 0.9615 0.70588
Neg Pred Value 0.98295 0.9529 0.98883
Prevalence 0.10714 0.1684 0.07143
Detection Rate 0.09184 0.1276 0.06122
Detection Prevalence 0.10204 0.1327 0.08673
Balanced Accuracy 0.92286 0.8757 0.91484
Class: 3 Class: 4 Class: 5
Sensitivity 0.69231 0.90000 0.66667
Specificity 0.96175 0.90860 0.91579
Pos Pred Value 0.56250 0.34615 0.20000
Neg Pred Value 0.97778 0.99412 0.98864
Prevalence 0.06633 0.05102 0.03061
Detection Rate 0.04592 0.04592 0.02041
Detection Prevalence 0.08163 0.13265 0.10204
Balanced Accuracy 0.82703 0.90430 0.79123
Class: 6 Class: 7 Class: 8
Sensitivity 0.70588 0.88889 0.55172
Specificity 0.98883 0.93583 0.97605
Pos Pred Value 0.85714 0.40000 0.80000
Neg Pred Value 0.97253 0.99432 0.92614
Prevalence 0.08673 0.04592 0.14796
Detection Rate 0.06122 0.04082 0.08163
Detection Prevalence 0.07143 0.10204 0.10204
Balanced Accuracy 0.84735 0.91236 0.76389
Class: 9
Sensitivity 0.31818
Specificity 0.98026
Pos Pred Value 0.82353
Neg Pred Value 0.83240
Prevalence 0.22449
Detection Rate 0.07143
Detection Prevalence 0.08673
Balanced Accuracy 0.64922
From the results above, we can see that the accuracy rate is 64.8% for the Naïve Bayes model.
Section 3: K-Nearest Neighbor method
The results of the build model are presented below;

End of preview

Want to access all the pages? Upload your documents or become a member.

Assignment on Statistics in R. Goals and Application

|13

|1059

|18

Assignment on Intelligent Systems for Analytics

|47

|6004

|28

Data Mining and Visualization for Business Intelligence

|14

|1554

|444

Study on Detection of Breast Cancer

|665

|193

Classification - Python

|14

|1321

|248

Predictive Maintenance for Industrial Machines using Artificial Neural Network

|56

|19002

|80

Digit Recognition and Pattern Analysis using Classification Techniques

End of preview

Assignment on Statistics in R. Goals and Applicationlg...

Assignment on Intelligent Systems for Analyticslg...

Data Mining and Visualization for Business Intelligencelg...

Study on Detection of Breast Cancerlg...

Classification - Pythonlg...