logo

Intrusion Detection using WEKA Data Analytics Technique

   

Added on  2022-11-26

12 Pages2158 Words211 Views
Data Science and Big DataStatistics and Probability
 | 
 | 
 | 
Introduction
Background
Over the years, firms have had to deal with cyber threats such as ransomwares. However, as of
2017 new threats begun to emerge and did so in a fast rate which saw the introduction of threats
such as Cryptojacking which was motivated with the popularity of the cryptocurrency where
business computers are hacked to be used for mining cryptocurrency [1]. Other threats include:
Internet of Things (IoT) device threats, Geopolitical risks, Cross-site scripting, and Mobile
malware. With the increasing number of threats to the cyber protection of organizations, the
issue of how to protect company resources such as data, finances, intrusion, etcetera remains of
concern to business executives. An article written in 2018 suggests security analytics as a
solution to cyber threats. “...Security Analytics is an approach to cybersecurity focused on the
analysis of data to produce proactive security measures.” [2]. The whole point of security
analytics lies in its ability to enable the transition from protection to detection and provide a
unified view of the enterprise which offers the firm a means through which to detect external
threats and gather intelligence [2].
Objective
In this paper, we will conduct intrusion detection using WEKA data analytic technique to
examine the intelligent security solutions based on data analytics and report on our findings.
Data analytics tools and techniques
Data analytic tools
We will be using the WEKA tool on windows 10 which is basically a standard Java tool used in
performing both machine learning experiments as well as embedding trained models in Java
Intrusion Detection using WEKA Data Analytics Technique_1

applications. Weka therefore is the best tool for us based on our research objectives which
intends to perform intrusion detection and given the wide usage of java applications in
technological products including operating systems [3].
Data analytics techniques
Our main focus is to compare different intrusion detection methods. Our objective is to classify
an activity as either normal or an anomaly, which makes it a binary problem that can be tackled
using classification techniques or prediction. As such, we will use Random Forest and Logistic
regression machine learning algorithms.
Random Forest
One of the best algorithms in classical machine learning is the random forest model which
according to the words of Niklas Donges is, “a flexible, easy to use machine learning algorithm
that produces, even without hyper-parameter tuning, a great result most of the time.” [4].
Classified under supervised learning algorithms, the Random Forest follows a simple application
which can be summarized as building multiple decision trees then merging them to obtain more
accurate and stable prediction results [4].
Perhaps the biggest merit of this algorithm is the fact that it can be used for both prediction and
regression problems making it suitable when the objective is to determine how different
predictor attributes affect a response attribute and how different attributes are grouped together.
Since decision trees are developed using the greedy algorithm which selects an optimum split
over each split process, the Random forest is an improvement of bagged decision trees and
disrupts the greedy splitting algorithm. When applying the model to our data, our main focus will
lie on the number of attributes we use for each split point.
Intrusion Detection using WEKA Data Analytics Technique_2

Logistic Regression
Logistic regression is a machine learning classification algorithm adopted when the problem
involves the need to “...assign observations to a discrete set of classes” [5] in which the outcome
is either binary or dichotomous. The objective of a logistic regression model is to explain the
relationship between the response (outcome) and explanatory variables. A logistic model follows
the formula:
Where p is the probability that the characteristic of interest is present. In addition, the models
logit transformation is defined as:
And
The problem with logistic regression is that it is not a very good classifier even though it is a
good prediction algorithm.
Data
To address our research objective, we will use observations on U2R and R2L attacks on
networks which was collected in 2009 for application in Computational Intelligence for Security
and Defense Applications [6]. The dataset which is divided into training and test set is stored
under ARFF and text formats and can obtained from https://www.unb.ca/cic/datasets/nsl.html.
Intrusion Detection using WEKA Data Analytics Technique_3

Data Analytic for Network Intrusion Detection
In this section we carry out the tasks specified in the requirements file and report on our findings.
Which will include: discussion on the data file format, feature selection for our two algorithms,
creation of training and testing sets if necessary and finally implementation of the models and
examining their performance.
Text and ARFF file formats
As the name suggests, a text file contains textual information. Our focus is on our file which is
stored in a .txt extension, commonly used to store information which is intended to be opened by
a wide range of different other applications. On the other hand, an Attribute-Relation File Format
(ARFF) is an ASCII text file used to describe a list of instances that share a set of attributes [7].
ARFF files are often the default files in WEKA and since we already have our data file in the
ARFF format, there is no need for conversion.
To conduct our analysis, we follow the following steps:
Intrusion Detection using WEKA Data Analytics Technique_4

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Data Analytics for Network Intrusion Detection
|15
|1674
|452

Data Analytics for Network Intrusion Detection
|13
|1664
|109

Data Mining: Tools, Techniques, and Uses
|13
|956
|38

Comparing Linear Regression and Random Forest Algorithms for Big Data Analysis
|41
|2948
|215

Machine Learning Technology Assignment
|5
|729
|15

Study on Detection of Breast Cancer
|4
|665
|193