logo

Anomaly Detection in Network Security using Bayesian Optimization and Machine Learning

   

Added on  2023-06-13

11 Pages3542 Words478 Views
Abstract: Nowadays, Network attacks are a familiar issue as the rate
of this prevalence is growing tremendously. Both organization and
individuals are now concerned about their confidentiality, integrity
and availability of their critical information which are often
impacted by network attacks. However, some machine learning
based interference finding methods has been developed to secure
any network infrastructure from network attacks. In this paper, an
anomaly framework has been proposed named Bayesian
Optimization while utilizing an accurate optimization and vigorous
method to assist vector machine through random Forest (RF), k-
Nearest Neighbor (k-NN) and Gaussian kernel (SVM-RBF). Bayesian
Optimization is utilized to search for more suitable models
parameter for categorising network interferences. This paper also
conducted experimentations through the ISCX 2012 dataset by
performing proposed methods. The result section elaborates the
effectiveness of the proposed methods in term of precision, low-
false alarm ratio, high-accuracy ratio and recall.
Keywords: network anomaly recognition, ISCX 2012, Bayesian
Optimization, Machine learning.
Introduction: In this decade, utilization of computer networks is
essential to survive in this modern high-tech world for almost every
organization. Organizations are now fully dependent on the
computer network to complete their daily business through online
services. In addition, customers also needs to access this networks
to share relevant information with this organizations. This
information are crucial hence cyber security is essential to safe-keep
this crucial information associates with government, customer and
enterprises. Enterprises must concern about protect and manage
data from any third party, abnormal activity or potential attackers
who may try to compromise the network. There is some pre-
mechanism available to protect any interference such as firewall
antivirus, data encryption and intrusion detection systems which is
not efficient to prevent from network attacks. However, the
incretion of sophisticated attackers making hard to protect all the
crucial information. These attackers are able to breach the
traditional security systems so this firewall and antivirus are unable
to prevent network attacks. There is mainly two type IDSs such as
anomaly based detection and signature based detection based on
their detection methodology. In the signature-based detection
systems, well-known intrusions will be detected by comparing the
observed data with pre-defined attack׳s patterns. In fact, misuse
detection methods work properly based on the known attack
signatures. Although the detection rate of well-known intrusions is
high with this method, however misuse detection methods face
with high false positive rate because of the continuously changing

nature of intrusions [Ref]. On the other hand, anomaly detection
methods are mainly based on hypothesizes that abnormal behavior
is diverse from normal behavior. Therefore, any deviation from the
normal behavior considered as abnormal or intrusions. Since,
anomaly-based IDSs need to build a model based on normal
patterns, they have the capability of detecting unknown
intrusions [Ref], the results that is achieved by the different
researchers of the article shows the fact that problem detection is
going on and the t6echnique that is being used for the different
research issues shows immediate results and betterment is worth
noticing in this case. The main problem is the ongoing research plot
that has been in an area that is being the most auspicious zone to
research with. This research is going on as the organization has to
deal with the networking of the data. That is trafficked. The
tensional training database have been also a part of the constant
changes in the environment which requires the need of real time
detection. Taking into consideration the instance the high
dimensionality of the training set becomes the most irrelevant
feature if the system. The redundancy and the high correlation rate
affects the functioning of the IDSs. Despite this facts the most
important feature is choosing the features to compliment the
subset of the design that influence the techniques of detection.
Parameter setting and the algorithms that are applied are the
optimal value of another factor that is influential in the aspect of
impressing the model. The paper includes the proposing of the
effective intrusion detection framework which is completely based
on optimized, machine learning. The machine learning is classified
into the SVM with the Gaussian kernel, RF and the k-NN which is
collaborated with the Bayesian optimization for providing an
accurate methodology that is used for the detection of the
intrusions. These are named as the BO-SVM, BO-RF and BO-KNN.
The evaluation of the performance of the stated method is done
and the process is compared with the experiments with the ISCX
2012 dataset. This data is collected from the University of
Brunswick. As mentioned in Wu and Banzhaf [ref], the robust IDS
must have the high detection rate among the entire processing and
the feature of low false alarm rate is deployed to the system.
Despite the fact that most of the detection technique user the
intrusion detection method suffers from the false alarm rate. The
optimized machine that function that will maximise the
effectiveness of the proposal will deal with the fact that The
effectiveness of the function maximises the effectiveness of the
methodologies. As mentioned in Wu and Banzhaf [ref], the
robustness of the IDS should be having the re detection test.
Recalling of the law false rate. The feasibility and the efficiency of
the task is used by the optimized methods that are compared to the

evaluation metrics of the project which, ensures the accuracy of the
precession and recalls the performance of the three methods that
are optimized in nature. This aspect also competes with the
standard approaches that were in the making of the process.
The main advantages are as follows: -
Investigating the performance of the machine learning and
focussing on the algorithm of the project.
The performance is enhanced due to the classification models
through the identification of the parameters which are optimal in
nature and acts as the function minimization.
UNB ISCX 2012, has been the benchmark that is set for the
visualization of the effects that are caused by the optimizing of the
theme of machine learning.
According to the Section II the related works are presented
throughout the project and is optimized well in the phase IV. The
part IV the functionality and they are present in the second section
of V. The remainder of this paper is organized. The project is
concluded discussing about the future aspect of the project.
RELATED WORK
Since the last years, many researchers have preferred the aspect
intrusion detection for the classification of the problem. They came
up with different methods by using the technique that utilizes data
mining, SVM [ref], Decision Trees [ref], k-NN (ref) [13], and Naive
Bayes [ref], a short review on intrusion is available in the field of
intrusion detection with the help of machine learning in Tsai et
al. [1]. This method is performed by achieving the promised results
by applying intelligence techniques for classifying the problems,
many researchers focus on proposing new methodologies based on
intelligence that influences the computing system. A review was
made using these approaches, this review was given by Wu and
Banzhaf [6].Many recent researches were done in this domain,
these researches are focused on the methodologies of improving
the efficiency of data mining techniques with the help of combined
swarm intelligence which is produced by optimization techniques.
For example, Chung and Wahid [ref] have proposed a feature
selection which is hybrid in the nature of selection and classifies
methodologies with the help of dynamic swarm, which is based on
rough set and simplified swarm optimization (SSO). Chung and
Wahid [ref] focussed on the enhancing of the performance of SSO in
order to find a solution that will be better from the neighbourhood
with the help of weighted local search (WLS) strategy. According to
the results, it is seen that this method achieved 93.3% accuracy in

the field of classifying intrusions. Another method which is
proposed by Kuang et al. [ref] for intrusion detection states that by
combining a multi-layered SVM with a kernel principal of
component analysis (KPCA) and genetic algorithm (GA) to increase
the efficient accuracy of the model. Binary PSO and RF strategies are
used for detecting the PROBE attacks that feature in the networking
system [ref]. In [ref], Zhang et al. application of RF is in inefficient
manner to detect anomaly, and hybrid-network-based IDSs, which
are used for the detection of the intrusions with the help of
signature matching, Novel intrusions are provided with the help of
the detection mechanism that uses he outliner strategy. In [ref]
Chuang et al. introduced an algorithm that deployed Catfish-BPSO
due to which the applied catfish effect for the selection and
betterment of the performance in the BPSO is noticed. Usage of k-
NN method is seen. This methodology of leave-one-out cross-
validation (LOOCV) is used for serving the process of fitness
terminologies.
THEORETIC ASPECTS OF THE TECHNIQUES
Support Vector Machines (SVM).
The support vector machine is used as a machine learning which is
derived and the linear classifier is used to classify in a two groups.
Classification problem of a maximum hyper plane that separates the
positive samples from the negative samples which are based on the
structural risk minimization principle [41]. In case the data is
inseparable, a non linear kernel function is used to solve this
problem [42,43]. The main goal of SVM is to find an optimal
separating hyper-plane by maximizing the margin between the
differentiation of hyper-plane and the closest data points of the
training set [42,43].the classification function of SVM is
f(x)=wT.x+b,
where b is the bias and w is a weight vector of the same dimension
as the feature space. By adjusting the b and w, we can determine
the position of the separating hyper-plane. To perform the
enhancement of the performance of SVMs in non-linear cases,
some kernel functions K(xi.xj) have been proposed to map (xTi.xj) in
the original input space to φ(xi)Tφ(xj) in some high-dimensional
feature space.In this study, we adopted Gaussian Kernel
k-Nearest Neighbor (k-NN)
k-NN is a simple and effective technique for objects classification
according to the closest training examples in the feature space [18].
Consider a set of observations and targets (x1, y1), ..., (xn, yn),
where observations xi Rd and targets yi {0, 1}; then for a given i,

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Artificial Intelligence-Based Intrusion Detection System
|12
|2988
|162