Machine Learning Classification: Algorithms and Performance Report

Verified

Added on  2022/11/09

|10
|1323
|244
Report
AI Summary
This report presents a detailed analysis of machine learning classification techniques, focusing on the performance of various algorithms on a human activity recognition dataset. The study evaluates four classification methods: K-Nearest Neighbor (KNN), Elastic Net, Support Vector Machine (SVM) with an RBF kernel, and Random Forest. The report provides an executive summary, introduction to machine learning concepts (supervised and unsupervised learning), and a discussion of the data collection process, involving 30 volunteers performing static and dynamic activities. Each classification technique is assessed using accuracy and F1-score metrics, with the optimal hyperparameter settings for each model. The results reveal that SVM (RBF kernel) achieves the highest accuracy and F1-score, outperforming the other algorithms. The report concludes with a discussion of the findings, highlighting the strengths and weaknesses of each method, and suggests future directions for improving classification accuracy by exploring additional algorithms.
Document Page
Running Head: MACHINE LEARNING
Machine Learning
Name of the Student:
Name of the University:
Course ID:
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
2Machine Learning
Executive Summary:
It is crucial and important to understand what machine learning does and how to use it for the betterment to the
society. Machine learning is termed to be as a set of computational tools and statistical techniques for prediction
the desired outcome or classifying the outcomes of a particular variable based on its interaction with other
variables in a dataset. Machine learning is a part of Artificial Intelligence, where algorithms are constructed to
train the model to get the outcome. There are many techniques through which we can predict from data from
previous analysis. Here different classification techniques have been performed on the dataset given to determine
the best and worst algorithm for the dataset and the performance will be measured with best accuracy.
Document Page
3Machine Learning
Contents
Introduction...............................................................................................................................................................4
Supervised Learning.............................................................................................................................................4
Un-Supervised Learning......................................................................................................................................4
Discussion..................................................................................................................................................................4
K-Nearest Neighbour classification.....................................................................................................................5
Elastic Net classification.......................................................................................................................................6
Support Vector Machine (RBF kernel) classification.......................................................................................6
Random Forest classification...............................................................................................................................7
Conclusion.................................................................................................................................................................9
References................................................................................................................................................................10
Document Page
4Machine Learning
Introduction
Machine learning is known to be a subpart of AI that give the system the ability to automatically learn
from the previous data and improve from experience without any human involvement (Dietterich 2000). The
main objective of machine learning is to focus on the development of algorithms that can have the access to data’s
and then learn for themselves (Kotsiantis et al. 2007). Mainly there are 2 types of machine learning algorithms:-
Supervised Learning
Un-Supervised Learning
Supervised Learning
Supervised learning deals with labelled data. Here the model learns from data, the pattern that corresponds
to the desired output. Supervised learning further classified into classification and regression algorithms (Bradley
1997).
Un-Supervised Learning
In Un-supervised learning the dataset use to train the model is neither classified nor labelled, rather the
model attempts to determine the unknown structure of the data by grouping similar samples of a particular feature
(Williams et al. 2006). Further un-supervised learning is classified into clustering and association.
Discussion
Data collection is used to understand human physical behaviour or can be said the main purpose is to
examine the human body motion by different sensors and gathers the data to analyse and predict the activity of
the human.
Static activities like i) sitting ii) standing and iii) laying down have been performed.
And dynamic activities like i) walking ii) walking downstairs and iii) walking upstairs have been performed.
In this analysis 30 volunteers have been participated with age varies from 19 to 48 years for these activities.
The training data consist of 70% instance and the testing data consist of 30% instance from the total collected
data.
In total 561 features have been used to represent each instances.
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
5Machine Learning
Types of features extracted are different frequency band, frequency skew-ness and angle between vectors.
Support Vector Machine (SVM) classifier is used to classify the recall and precision percentage of the dataset.
The maximum accuracy achieved in SVM classification is 96% which was pretty good.
In this assignment 4 types of classification problem has been performed which are K-Nearest Neighbour
classification, Elastic Net classification, Support Vector Machine(RBF Kernel) classification and at the end we
performed Random Forest classification (Domingos 2012).
For each of the classification different accuracy and F1-score has been seen through the output of our model.
K-Nearest Neighbour classification
For the KNN classification the best k-value for our model is 10 which has the highest f1-score.
Accuracy Score: 90.66847641669494
F1-Score: 90.38079349608216
Fig 1- cross_val_score vs k-value
Document Page
6Machine Learning
Elastic Net classification
For the Elastic net classification the best value of alpha is 1e-4 and l1_ratio is 0.5.
Accuracy Score: 95.07974211062097
F1-Score: 95.05911267493121
Support Vector Machine (RBF
kernel) classification
For SVM (RBF kernel) the best value of Gamma is
1e-3 and C is 1000.
Accuracy Score: 96.57278588394978
F1-Score: 96.57675195456977
Fig 2- f1 score with respect to alpha and l1-ratio values
Document Page
7Machine Learning
Random Forest classification
For Random Forest classifier the best value of tree-depth is 300 and number of trees are 700.
Accuracy Score: 90.63454360366474
F1-Score: 90.33819409388762
Fig 3- f1 score with respect to gamma and C values
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
8Machine Learning
From the observation it can be concluded that-
SVM (RBF Kernel) shows the best performance on the given dataset as the Accuracy Score and the F1-score
percentage has the highest in compare with the other 3 classifier.
It is really effective with higher dimensions.
Best algorithm when the classes are separable.
The hyper plane is affected by only the support vectors thus outliers have less impact.
The Random Forest classifier performed comparatively below average or worst from the rest of the 3 classifiers.
The main disadvantage of random forest is its complexity and the prediction procedure in random forest is time
consuming in compare with other algorithm. Hence due to such reasons the accuracy and the f1-score of random
forest is worse than others (Liaw and wiener 2002). Another reason is that random forest models are black boxes
that are very hard to interpret.
For our model SVM produces the highest accuracy, but for more accuracy we might consider using other
classification algorithms in our dataset to produce better results.
Conclusion
This paper discussed about different machine learning classification algorithms. Machine learning is the
latest trends in the technology which are used by big tech. giants like Google, Amazon and many other
companies. So it can be concluded that with the dataset provided we performed 4 different types of classification
techniques are been done out of which SVM give the best accuracy out of all, which is our desired goal. Surface
plot has been plotted with 3-D space considering z axis as f1-score. At the end we discussed why SVM performed
more accurate and the reason. In future more classification technique will be implemented in this dataset to get
better accuracy and F1-score.
Fig 4- f1 score with respect to tree depth and no. of trees
Document Page
9Machine Learning
References
Bradley, A.P., 1997. The use of the area under the ROC curve in the evaluation of machine learning algorithms.
Pattern recognition, 30(7), pp.1145-1159.
Dietterich, T.G., 2000, June. Ensemble methods in machine learning. In International workshop on multiple
classifier systems (pp. 1-15). Springer, Berlin, Heidelberg.
Domingos, P.M., 2012. A few useful things to know about machine learning. Commun. acm, 55(10), pp.78-87.
Kotsiantis, S.B., Zaharakis, I. and Pintelas, P., 2007. Supervised machine learning: A review of classification
techniques. Emerging artificial intelligence applications in computer engineering, 160, pp.3-24.
Liaw, A. and Wiener, M., 2002. Classification and regression by randomForest. R news, 2(3), pp.18-22.
Document Page
10Machine Learning
Williams, N., Zander, S. and Armitage, G., 2006. A preliminary performance comparison of five machine
learning algorithms for practical IP traffic flow classification. ACM SIGCOMM Computer Communication
Review, 36(5), pp.5-16.
chevron_up_icon
1 out of 10
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]