Your All-in-One AI-Powered Toolkit for Academic Success.

+13062052269

info@desklib.com

Available 24*7 on WhatsApp / Email

Company

Tools

Support

Data Analytics for Cyber Security: Classifier Performance Analysis

Verified

Added on 2022/08/15

AI Summary

This report presents a comparative analysis of five different classifiers: Naïve Bayes, Decision Trees, Logistic Regression, Neural Nets, and K-Nearest Neighbors, using a Twitter dataset to classify tweets as spam or not spam. The study utilizes five performance metrics: accuracy, specificity, precision, recall, and the F1 score to evaluate the effectiveness of each classifier. The methodology involves training and testing each classifier on the provided dataset, with two testing datasets, one representing an ideal scenario and the other reflecting a more realistic distribution of spam tweets. The report includes a literature review of the classifiers and metrics, technical demonstrations of the implementation in R, and a detailed performance evaluation of the classifiers. The Neural Nets classifier is identified as the best performing model across the chosen metrics. The report concludes with insights into the strengths and weaknesses of each classifier, offering valuable information for data analytics and machine learning applications, particularly in cyber security contexts.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.

Comparison of Classifiers Using Different Performance Evaluation Metrics

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Comparison of Classifiers Using Different Performance Evaluation Metrics
Executive Summary
This report presents the results from the investigation of the performance of different classifiers
using the same data. The classifiers selected for evaluation are Naïve Bayes Classifier, Decision
Trees, Logistic Regression, Neural Nets and K-Nearest Neighbors. Each of these classifiers is
trained and tested on twitter data, where the response is whether a tweet is a spam or not. Five
performance metrics are used to evaluate the performance of the models: accuracy, specificity,
precision, recall and the F1 score. The Neural Nets classifier presents the best model for the
classification of tweets with the best performance across three performance metrics on the real
world like testing dataset.
2

Comparison of Classifiers Using Different Performance Evaluation Metrics
Table of Contents
Introduction.................................................................................................................................................4
Literature Review........................................................................................................................................5
Classifiers................................................................................................................................................5
Performance Metrics...............................................................................................................................7
Technical Demonstration.............................................................................................................................8
Performance Evaluation............................................................................................................................16
Conclusion.................................................................................................................................................20
References.................................................................................................................................................21
Appendix: Source File...............................................................................................................................23
3

Comparison of Classifiers Using Different Performance Evaluation Metrics
Introduction
Different classifiers are available for the grouping of items in machine learning and big data in
general. Classifiers are machine-learning algorithms that are used in prediction of the group that
an item is likely to fall under (Shaffer, 2011; Vicenc, 2017). This study is interested in evaluating
the performance of different classifiers on the same dataset. The following classifiers are going
to be applied and their performance evaluated in the study: Naïve Bayes Classifier, Decision
Trees, Logistic Regression, Neural Nets and K-Nearest Neighbors. In order to evaluate the
performance of these classifiers, the following performance evaluation metrics are going to be
going to be used; accuracy, specificity, precision, recall and the F1 score.
A dataset on tweets is going to be used as the data for both training the classifiers and testing
their performances. Social media presents the modern platform for both informing and
communicating, making it perfect for the application of machine learning (Witten, 2011). It also
makes it the best source for data from which meaningful and useful inferences can be drawn. The
power of social media in the current societal setup makes its data important for social, political
as well as economic purposes (Agozzino, 2012). The target aspect for this study is determining
how well each of the classifiers can identify whether a tweet is a spam or not.
The features of interest in the classifications in the study are the age of the twitter account,
number of lists that the account is in, number of accounts that the account follows, the number of
accounts that follow that account, the number of tweets published by the account and number of
favorites for the tweets published by the account. For the specific tweet, interest will be on
number of hashtags included, number of retweets, number of URLS included, number of
favorites, number of mentions, number of characters and number of digits.
4

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Comparison of Classifiers Using Different Performance Evaluation Metrics
This study will review literature on both the classifiers and the performance metrics of interest
followed by the presentation of the analysis process using each of the classifiers. The
performance of each of the classifiers will then be evaluated using the metrics, finally
conclusions will be drawn, and inferences made.
Literature Review
Classifiers
A number of researches have been done in the application of classifiers in data mining
approaches. Cheolhwan and Stanislaw (2010) explores the application of artificial neural
networks in image recognition. The article argues that Generalized Brain-State-in-a-Box Neural
Networks provide the best neural network approach for the case of image recognition in large-
scale data. Neural networks are classification and prediction models that work is ways similar to
the human brain and is similarly composed of nodes and links (Cheolhwan & Stanislaw, 2010). The
study concludes that for the case of image recognition, which may find application in security,
the neural nets provide the best classification algorithm and specifically the Generalized Brain-
State-in-a-Box Neural Networks.
The research in Alberto, Alfons and Enrique (2012) considers speech recognition and confidence
estimation as an area of possible application of machine learning. The study evaluates the
applicability of the Naïve Bayes classifier for the confidence estimation of words in speech
recognition. Determination of truthfulness in presented information is key in informing the level
of seriousness a statement will be given especially in security related instances, making speech
5

Comparison of Classifiers Using Different Performance Evaluation Metrics
recognition and confidence estimation vital. Alberto, Alfons and Enrique (2012) describes the
Naïve Bayes classifier as a classification model based on the Bayes Theorem and using the
principles of conditional probability provided in the theorem. Alberto, Alfons and Enrique
(2012) conclude that the application of the generalized and specific Naïve Bayes models together
for the statistical language modelling yield a better performing model for speech recognition and
confidence estimation.
Ibrahim et al. (2016) discusses the application of machine learning in identification of abnormal
behavior on online platforms. Detection of abnormal online behavior is a key security interest
especially with the increased reliance on cloud technology and online based services. The
research proposes the application of decision trees to real-time data as a means of identifying
activities as either abnormal or normal. According to Ibrahim et al. (2016), decision trees are
classification algorithms that use the concept of trees and branches to conduct recursive
partitioning of data from its complete form down to groups by following rules based on the
features of interest. The study by Ibrahim et al. (2016) concludes that although modifications are
necessary to avoid overfitting of the decision trees, the decision trees form a viable classifier for
detection of online anomalies.
Guarding against insider attacks is paramount in cloud technology (Subrahmanya, et al., 2017).
The research in Subrahmanya et al. (2017) notes that the best way to void insider attacks in cloud
technology is in being able detect whether the individual accessing the cloud is legitimate or not.
In order to achieve this, Subrahmanya et al. (2017) suggests the use of K-Nearest Neighbors
classifier for the classification of individuals accessing the cloud. The study describes the K-
Nearest Neighbors classifier as a classification model that observes k neighbors of a new item as
well as their classes and use this information to assign the new item to the predominant class
6

Comparison of Classifiers Using Different Performance Evaluation Metrics
among the neighbors. Subrahmanya et al. (2017) finds that the K-Nearest Neighbor classifier
provides for a sufficient machine learning approach to threat detection in clouds.
The research in Tao and Longtao (2018) is interested in presenting a model for countering
website vulnerability. Web vulnerability represents a security challenge for website owners and
identifying abnormal website activities is necessary. Tao and Longtao (2018) proposes a logistic
regression model for the identification of web traffic as either an anomaly or normal. Logistic
regression is a predictive and classification regression approach that is applied specifically for
instances when the target variable is categorical (Tao & Longtao, 2018). In their conclusion Tao
and Longtao (2018) suggest that model accuracy can be improved by having the loss function as
the LBFGS algorithm. The study also indicate the logistic regression is efficient in detection of
abnormal web traffic even if it is recently generated.
Performance Metrics
In Cheng & Xiongwei (2010), the study considers different measures for the performance of
classifiers. According to the study, recall refers to the ability of a classifier to identify as many
positives as possible. The study in Amasyali and Ersoy (2011) focuses on comparing different
classifiers based on accuracy. The study defines accuracy as the measure of the true outcomes
among all outcomes. Amasyali and Ersoy (2011) describe accuracy as quintessential and fit for
two grouped cases as well as multi-grouped cases.
Dell et al. (2015) explores the Bayesian reasoning to enable the F1 Score to be a better metrics
for the performance of classifiers. The research in Dell et al. (2015) explains F1 Score as a
measure that assumes a balance between the recall and precision of a classifier. The research in
7

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Comparison of Classifiers Using Different Performance Evaluation Metrics
Farideh, Abbas and Shahram (2017) is concerned with ways of improving the precision of the K-
Nearest Neighbors classifiers. Farideh, Abbas and Shahram (2017) define precision as the
measure of the true positives predicted by a classifier. Mubeen (2018) aims at determining the
specificity of a biometric based algorithm as a way of countering instances of fraud. According
to Mubeen (2018), specificity refers to the ability of a classifier to correctly detect the true
negative entries.
Technical Demonstration
The data on the tweets was divided into a training dataset and two testing datasets. The training
set and one of the testing datasets, testing dataset 1, had equal numbers of spammer and non-
spammer tweets, while testing dataset 2 had the ratio of spammer to non-spammer tweets at 1:19.
Testing dataset 1 represents an ideal dataset while testing dataset 2 represents a more real-world
dataset.
For all the classifiers the confusionmatrix(), precision() and recall() functions in the caret
package were used to get the accuracy and specificity, precision and recall metrics respectively.
The F1_Score() function in the MLmetrics package was used to get the F1 Score metrics.
Table 1: Naive Bayes Classifier below gives the codes for the Naïve Bayes model which involved
the application of the naiveBayes() function in the e1071 package. This function was applied to
the model formulae with Status (spammer of non-spammer) as the response and the data being
the training data. The predict() base function was then used with the resulting model and to test it
with the two testing datasets.
8

Comparison of Classifiers Using Different Performance Evaluation Metrics
Table 1: Naive Bayes Classifier Screenshots
Table 2 below gives the codes and plot output for the Decision Trees model which involved the
application of the rpart() function in the rpart package. This function was applied to the model
formulae with Status (spammer of non-spammer) as the response and the data being the training
data. The predict() base function was then used with the resulting model and to test it with the
two testing datasets.
9

Comparison of Classifiers Using Different Performance Evaluation Metrics
Table 2: Decision Trees Classifier Screenshots
10

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Comparison of Classifiers Using Different Performance Evaluation Metrics
Table 3 below gives the codes and partial output for the Logistic Regression model which
involved the application of the glm(). This function was applied to the model formulae with
Status (spammer of non-spammer) as the response and the data being the training data. A
remodeling was done to exclude the no_tweetfavorites variable, which returned NA for all model
parameters as seen in the second row in the table. The predict() base function was then used with
the resulting model and to test it with the two testing datasets. Since the output of the model is
not in numerical form, conversion to 0s and 1s as well as to factor form using loops (for) and
conditional statements (if) was carried out prior to performance evaluation.
Table 3: Logistic Regression Classifier Screenshots
11

Comparison of Classifiers Using Different Performance Evaluation Metrics
12

Comparison of Classifiers Using Different Performance Evaluation Metrics
Table 4 below gives the codes and plot output for the Neural Nets model which involved the
application of the neuralnet() function in the neuralnet package. This function was applied to the
model formulae with Status (spammer of non-spammer) as the response and the data being the
training data. The linear.output parameter was set to False with the number of hidden layers set
as 3. The compute() function in the neuralnet package was then used with the resulting model
and to test it with the two testing datasets. Since the output of the model is not in numerical form,
conversion to 0s and 1s as well as to factor form using loops (for) and conditional statements (if)
was carried out prior to performance evaluation.
Table 4: Neural Nets Classifier Screenshots
13

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Comparison of Classifiers Using Different Performance Evaluation Metrics
14

Comparison of Classifiers Using Different Performance Evaluation Metrics
Table 5 below gives the codes for the K-Nearest Neighbors model which involved the application
of the knn() function in the FNN package. This function was applied to the model formulae with
Status (spammer of non-spammer) as the class, train as training data and test as each of the
testing datasets for the two tests.
Table 5: K-Nearest Neighbors Classifiers Screenshots
15

Comparison of Classifiers Using Different Performance Evaluation Metrics
Performance Evaluation
Table 6 provides the screenshots showing the performance metrics for the different classifiers
with each testing datasets.
Table 6: Performance Evaluation Screenshots
Classifier Testing Dataset 1 Testing Dataset 2
Naïve Bayes
16

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Comparison of Classifiers Using Different Performance Evaluation Metrics
Decision Trees
Logistic Regression
Neural Nets
17

Comparison of Classifiers Using Different Performance Evaluation Metrics
K-Nearest Neighbors
Table 7 and Table 8 below present the summary of the performance evaluation results in Table 6
above.
From Table 7, we observe that, for the testing dataset 1, the Decision Trees is the most accurate
and has the highest F1 score, Naïve Bayes is the most specific and precise and Neural Nets have
the highest recall.
Table 7: Testing Dataset 1 Performance Metrics
Classifier Accuracy Specificity Precision Recall F1 Score
Naïve Bayes 0.5615 0.9680 0.8289 0.1550 0.2612
Decision
Trees
0.732 0.6980 0.7172 0.7660 0.7408
Logistic
Regression
0.6875 0.7430 0.7109 0.6320 0.6691
Neural Nets 0.7065 0.5750 0.6635 0.8380 0.7406
K-Nearest 0.6620 0.6660 0.6633 0.6580 0.6606
18

Comparison of Classifiers Using Different Performance Evaluation Metrics
Neighbors
From Table 8 we observe that, for the testing dataset 1, the Naïve Bayes is the most specific and
most precise while the Neural Nets are the most accurate with highest recall and F1 scores.
Table 8: Testing Dataset 2 Performance Metrics
Classifier Accuracy Specificity Precision Recall F1 Score
Naïve Bayes 0.1865 0.9800 0.9928 0.1447 0.2526
Decision
Trees
0.4780 0.2310 0.9799 0.7705 0.8627
Logistic
Regression
0.6320 0.7700 0.9820 0.6247 0.7633
Neural Nets 0.8125 0.5800 0.9739 0.8247 0.8931
K-Nearest
Neighbors
0.6565 0.6700 0.9742 0.6558 0.7839
Conclusion
The analysis in this study reveal that for an ideal dataset of tweets, as that presented in testing
dataset 1 (with equal number of spammer and non-spammer tweets), then the best classifier
would either be the Decision Trees classifier or the Naïve Bayes classifier, with each having the
top values in two of the five metrics. For a real world dataset, as that presented in testing dataset
2 (with a majority of the tweets being non-spammers), then the best classifier would be the
Neural Nets classifier, having top values in three of the five metrics. Therefore, the Neural Nets
19

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Comparison of Classifiers Using Different Performance Evaluation Metrics
classifier presents the best model for identifying spammer and non-spammer tweets. However, if
interest were in using a model with the best specificity and precision, then the Naïve Bayes
classifier would be the best classifier since it has the highest values for the two metrics in both
the ideal dataset of tweets, testing dataset 1, and the real world dataset of tweets, testing dataset
2.
References
Agozzino, A 2012, 'Building a Personal Relationship Through Social Media: A Study of
Millenial Students's Brand Engagement', Ohio Communication Journal, vol.50, no.1, pp. 528-
543.
Alberto, S, Alfons, J & Enrique, V 2012, 'A Word-Based Naïve Bayes Classifier for Confidence
Estimation in Speech Recognition', IEEE Transactions on Audio, Speech, and Language
Processing, vol.20, no.2, pp. 565 - 574.
Amasyali, M & Ersoy, O 2011, 'Comparison of single and ensemble classifiers in terms of
accuracy and execution time', 2011 International Symposium on Innovations in Intelligent
Systems and Applications, vol.12, no.1, p.34-40, viewed 30 January 2020, IEEE.
20

Comparison of Classifiers Using Different Performance Evaluation Metrics
Cheng, W & Xiongwei, Y 2010, 'Measure identification of classifier performance', The 2nd
International Conference on Information Science and Engineering, vol.4, no.23, p. 1-20, viewed
30 January 2020, IEEE.
Cheolhwan, OH & Stanislaw, ZH 2010, 'Large-Scale Pattern Storage and Retrieval Using
Generalized Brain-State-in-a-Box Neural Networks', IEEE Transactions on Neural Networks,
vol.21, no.4, pp. 633-643.
Dell, Z, Jun, W, Xiaoxue, Z & Xiaoling, W 2015, 'A Bayesian Hierarchical Model for
Comparing Average F1 Scores', 2015 IEEE International Conference on Data Mining , vol. 1,
no.4, p. 7-16, viewed 30 January 2020, IEEE.
Farideh, S, Abbas, H & Shahram, G 2017, 'Improving the precision of KNN classifier using
nonlinear weighting method based on the spline interpolation', 2017 7th International Conference
on Computer and Knowledge Engineering (ICCKE), vol.24, no.7, p.35-41, viewed 30 January
2020, IEEE.
Ibrahim, D, Kaan, G, Mustafa, S, Lemi, B & Suleyman, KS 2016, 'Online Anomaly Detection
With Nested Trees', IEEE Signal Processing Letters , vol.23, no.12, pp. 1867 - 1871.
Mubeen, S 2018, 'Sensitivity and Specificity Analysis of Fingerprints Based Algorithm', 2018
International Conference on Applied and Engineering Mathematics (ICAEM), vol.3, no.11, p.17-
29, viewed 30 January 2020, IEEE.
Shaffer, CA 2011, Data Structures and Algorithms Analysis, 3rd edn, Dover, Mineola.
Subrahmanya, S, Srinivas, Y, Abhiram, M, Lakshminarayana, U, Sahithi, P & Rojee, J 2017,
'Insider Threat Detection with Face Recognition and KNN User Classification', 2017 IEEE
21

Comparison of Classifiers Using Different Performance Evaluation Metrics
International Conference on Cloud Computing in Emerging Markets (CCEM), vol.45, no.13,
p.56-68, viewed 30 January 2020, IEEE.
Tao, L & Longtao, Z 2018, 'Application of Logistic Regression in WEB Vulnerability Scanning',
2018 International Conference on Sensor Networks and Signal Processing (SNSP), vol.15, no.2,
p.75-91, viewed 30 January 2020, IEEE.
Vicenc, T 2017, Studies in Big Data, 1st edn, Springer International Publishing, Chicago.
Witten, IH 2011, Data Mining: Practical Machine Learning Tools, 3rd edn, Morgan Kaufmann,
Sydney.
Appendix: Source File
#Loading Data
22

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Comparison of Classifiers Using Different Performance Evaluation Metrics
#Training Set
training_data <- read.csv("D:/FileStorage/Docs/dataset/training_data.txt", header=FALSE,
comment.char="#")
#Testing Sets
testing_data1 <- read.csv("D:/FileStorage/Docs/dataset/testing_data1.txt", header=FALSE)
testing_data2 <- read.csv("D:/FileStorage/Docs/dataset/testing_data2.txt", header=FALSE)
#Naming Columns
VarNames <- c("account_age", "no_follower", "no_following", "no_userfavorites", "no_lists",
"no_tweets", "no_retweets", "no_tweetfavorites", "no_hashtags", "no_usermentions",
"no_urls", "no_char", "no_digits", "status")
colnames(training_data) <- VarNames
colnames(testing_data1) <- VarNames
colnames(testing_data2) <- VarNames
#==================================================================
============
#Naive Bayes Classifier
#Loading Packages
library(caret)
23

Comparison of Classifiers Using Different Performance Evaluation Metrics
library(e1071)
library(MLmetrics)
#Training Model
NBModel <- naiveBayes(status~., data = training_data)
NBModel
#TestingModel
#Test 1
NBPred1 <- predict(NBModel, newdata = testing_data1)
#Evaluation Metrics
#Accuracy and Specificty
confusionMatrix(NBPred1, testing_data1$status)
#Precision
precision(NBPred1, testing_data1$status)
#Recall
recall(NBPred1, testing_data1$status)
#F1 Score
F1_Score(testing_data1$status, NBPred1)
24

Comparison of Classifiers Using Different Performance Evaluation Metrics
#Test 2
NBPred2 <- predict(NBModel, newdata = testing_data2)
#Evaluation Metrics
#Accuracy and Specificty
confusionMatrix(NBPred2, testing_data2$status)
#Precision
precision(NBPred2, testing_data2$status)
#Recall
recall(NBPred2, testing_data2$status)
#F1 Score
F1_Score(testing_data2$status, NBPred2)
#==================================================================
============
#Decision Trees
#Loading Packages
library(rpart)
library(rpart.plot)
#Training Model
DCModel <- rpart(status~., data = training_data, method = "class")
25

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Comparison of Classifiers Using Different Performance Evaluation Metrics
prp(DCModel, type = 1, extra = 1, under = TRUE, split.font = 1, varlen = -10)
#Testing Model
#Test 1
DCPred1 <- predict(DCModel, newdata = testing_data1, type = "class")
#Evaluation Metrics
#Accuracy and Specificty
confusionMatrix(DCPred1, testing_data1$status)
#Precision
precision(DCPred1, testing_data1$status)
#Recall
recall(DCPred1, testing_data1$status)
#F1 Score
F1_Score(testing_data1$status, DCPred1)
#Test 2
DCPred2 <- predict(DCModel, newdata = testing_data2, type = "class")
#Evaluation Metrics
#Accuracy and Specificty
confusionMatrix(DCPred2, testing_data1$status)
#Precision
26

Comparison of Classifiers Using Different Performance Evaluation Metrics
precision(DCPred2, testing_data2$status)
#Recall
recall(DCPred2, testing_data2$status)
#F1 Score
F1_Score(testing_data2$status, DCPred2)
#==================================================================
===========
#Logistic Regression
#Training Model
LRModel <- glm(status~., data = training_data, family = "binomial")
options(scipen = 999)
summary(LRModel)
#Remodelling
LRModel1 <- glm(status~., data = training_data[, -8], family = "binomial")
options(scipen = 999)
summary(LRModel1)
#Testing Model
#Test 1
LRPred1 <- predict(LRModel1, newdata = testing_data1[, c(-8,-14)], type = "response")
LRPred1 <- ifelse(LRPred1 > 0.5, 1, 0)
27

Comparison of Classifiers Using Different Performance Evaluation Metrics
for(i in 1:length(LRPred1))
{
if(LRPred1[i] == 0)
{
LRPred1[i] <- "non-spammer"
}
if(LRPred1[i] == 1)
{
LRPred1[i] <- "spammer"
}
}
LRPred1 <- factor(LRPred1)
#Evaluation Metrics
#Accuracy and Specificty
confusionMatrix(LRPred1, testing_data1$status)
#Precision
precision(LRPred1, testing_data1$status)
#Recall
recall(LRPred1, testing_data1$status)
#F1 Score
28

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Comparison of Classifiers Using Different Performance Evaluation Metrics
F1_Score(testing_data1$status, LRPred1)
#Test 2
LRPred2 <- predict(LRModel1, newdata = testing_data2[, c(-8,-14)], type = "response")
LRPred2 <- ifelse(LRPred2 > 0.5, 1, 0)
for(i in 1:length(LRPred2))
{
if(LRPred2[i] == 0)
{
LRPred2[i] <- "non-spammer"
}
if(LRPred2[i] == 1)
{
LRPred2[i] <- "spammer"
}
}
LRPred2 <- factor(LRPred2)
#Evaluation Metrics
#Accuracy and Specificty
confusionMatrix(LRPred2, testing_data2$status)
29

Comparison of Classifiers Using Different Performance Evaluation Metrics
#Precision
precision(LRPred2, testing_data2$status)
#Recall
recall(LRPred2, testing_data2$status)
#F1 Score
F1_Score(testing_data2$status, LRPred2)
#==================================================================
=========
#NeuralNets
#Loading Packages
library(neuralnet)
#Training Model
nnModel <- neuralnet(status~., data = training_data, linear.output = F, hidden = 3)
nnModel$weights
plot(nnModel,rep = "best")
#Testing Model
#Test 1
nnPred <- compute(nnModel, testing_data1[, -14])
predicted.class=apply(nnPred$net.result,1,which.max)-1
30

Comparison of Classifiers Using Different Performance Evaluation Metrics
for(i in 1:length(predicted.class))
{
if(predicted.class[i] == 0)
{
predicted.class[i] <- "non-spammer"
}
if(predicted.class[i] == 1)
{
predicted.class[i] <- "spammer"
}
}
predicted.class <- factor(predicted.class)
#Evaluation Metrics
#Accuracy and Specificty
confusionMatrix(predicted.class, testing_data1$status)
#Precision
precision(predicted.class, testing_data1$status)
#Recall
recall(predicted.class, testing_data1$status)
#F1 Score
F1_Score(testing_data1$status, predicted.class)
31

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Comparison of Classifiers Using Different Performance Evaluation Metrics
#Test 2
nnPred2 <- compute(nnModel, testing_data2[, -14])
predicted.class2=apply(nnPred2$net.result,1,which.max)-1
for(i in 1:length(predicted.class2))
{
if(predicted.class2[i] == 0)
{
predicted.class2[i] <- "non-spammer"
}
if(predicted.class2[i] == 1)
{
predicted.class2[i] <- "spammer"
}
}
predicted.class2 <- factor(predicted.class2)
#Evaluation Metrics
#Accuracy and Specificty
confusionMatrix(predicted.class2, testing_data2$status)
32

Comparison of Classifiers Using Different Performance Evaluation Metrics
#Precision
precision(predicted.class2, testing_data2$status)
#Recall
recall(predicted.class2, testing_data2$status)
#F1 Score
F1_Score(testing_data2$status, predicted.class2)
#==================================================================
============
#K-Nearest Neighbors
#Loading Packages
library(FNN)
#Training Model 1 and Test 1
KnnModel1 <- knn(train = training_data[,-14], test = testing_data1[,-14],
cl = training_data[, 14], k = 3)
#Evaluation Metrics
#Accuracy and Specificty
confusionMatrix(KnnModel1, testing_data1$status)
#Precision
precision(KnnModel1, testing_data1$status)
33

Comparison of Classifiers Using Different Performance Evaluation Metrics
#Recall
recall(KnnModel1, testing_data1$status)
#F1 Score
F1_Score(testing_data1$status, KnnModel1)
#Training Model 2 and Test 2
KnnModel2 <- knn(train = training_data[,-14], test = testing_data2[,-14],
cl = training_data[, 14], k = 3)
#Evaluation Metrics
#Accuracy and Specificty
confusionMatrix(KnnModel2, testing_data2$status)
#Precision
precision(KnnModel2, testing_data2$status)
#Recall
recall(KnnModel2, testing_data2$status)
#F1 Score
F1_Score(testing_data2$status, KnnModel2)
34