MN623 - Data Analytics for Network Intrusion Detection Report

Verified

Added on  2023/03/31

|15
|1674
|452
Report
AI Summary
This report focuses on implementing intrusion detection on a public data network using WEKA for evaluation. It analyzes the NSL-KDD dataset to identify various attack types, including user-to-root, remote-to-local, and probing attacks, and evaluates the performance of different classification protocols for conflicting network traffic systems. The analysis involves data processing using WEKA, investigating the relationship between algorithms and network attacks. The report covers data analytic tools and techniques, benchmark data, feature selection, testing and training samples, and performance evaluation using a confusion matrix. It also addresses limitations like overfitting and suggests future research directions, including the use of ensemble tools and further exploration of classification techniques for effective network infiltration detection. This document is available on Desklib, a platform offering study tools for students.
Document Page
University
Semester
Cyber Security and Analytics
Student ID
Student Name
Submission Date
1
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Table of Contents
Project Description................................................................................................................................3
Data set Description..............................................................................................................................3
Section 1 - Data Analytic Tools and Techniques...................................................................................3
Section 2 - Data Analytic for Network Intrusion Detection...................................................................9
1. Bench Mark Data.......................................................................................................................9
2. Select the Features...................................................................................................................10
3. Create Testing and Training Samples......................................................................................11
4. Data analytic Techniques.........................................................................................................12
5. Network Intrusion....................................................................................................................13
6. Performance of Intrusion Detection - Confusion Matrix..........................................................13
7. Limitation of over fitting.........................................................................................................14
8. Use of ensemble Tools.............................................................................................................14
9. Recommendation.....................................................................................................................14
10. Future Research Work.........................................................................................................14
References...........................................................................................................................................15
Project Description
The main aim of this project is to implement the intrusion detection on the public data
network using “Weka” evaluation. Analysis byWeka dataset for identifying the list of attacks
2
Document Page
that can follow on user -to -root, remote-to-local and probing on the selected NSL-KDD data
set which is analyzed and the performance of various classification protocolsfor the
conflicting network traffic systems by using weka data analytics techniques. Relationship
analysis for the available typical Network intrusion Stack is used by the intrusion to create
conflicting network traffic. The Analysis is done by using the data processing tool
WEKA.We shall investigate and study the facts about the bonding between algorithms and
network attacks. The investigating and the research report on the weka analysis will consist
of the three different stages on the network intrusion detection method. The Weka data
analytics will be used as the method on comparative analysis for detecting the network
intrusion of cyber security, and same shall be investigated.
Data set Description
The cyber security of the Weka analytics on the network intrusion detection of the NSL-
KDD dataset is specified in eight stages. A data set is recommended for solving some of the
inherent problems. The NSL -KDD dataset is used for comparing the various infiltration
detection methods for getting high computational accuracy. Finding the contradictions in the
network traffic is the key goal Developing mechanical learning algorithms and intrusion
detection systems of the weka mining tools NSL-KDD dataset file such as,
KDDTrain+.ARFF:
The dataset on the full NSL-KDD can be defined by the ARFF binary label format in
comparative analysis by using the Weka mining tools.
KDDTrain+.TXT:
The training set data can include the CSV format on the data set to identify the Attack type
label on the NSL-KDD dataset.
KDDTrain+_20Percent.ARFF
The analytic data on the subnet KDDTrain+.ARFF file is to be identified, which is the 20 %
on the mining tool for intrusion detection.
KDDTrain+_20Percent.TXT
Analysis data in KDDTrain on the compatible network of the file needs to be identified 20%
of the mining tool in the internal entry detection.
KDDTest+.ARFF
Full NSL-KDD test package including the attack type labels and problem levels are in CSV
Shape for the intrusion detection.
KDDTest+.TXT
Full NSL-KDD test package including the attack type labels and problem levels are in CSV
of the classification for the mining data analytics shape in the intrusion detection.
KDDTest-21.ARFF
3
Document Page
A subgroup KDDTest Entries are not included as the difficulty level of 21 out of 21 on the
NSL-KDD dataset on the intrusion detection on the mining data analytics.
KDDTest-21.TXT
Subset of KDDTest .txt this includes the file Difficulty level posts 21 of 21 on KDD test of
Intrusion detection.
Section 1 - Data Analytic Tools and Techniques
In this section, we have to install, analyse the data analytics site and demonstrate by
using at least two data analysis techniques. The steps include,
Select the data analysis platform dataset
Select the data mining and clustering technique
Select the knowledge flow of the analysis dataset in weka tool.
The initial stage in the process, first open weka as show below,
Upload NSL-KDD dataset, by selecting the Explorer as shown in the below image
4
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Once the data is uploaded, open the preprocessing tab and directly select the file accessing on
NSL KDD data , as shown in the below image [2],
5
Document Page
The network intrusion detection on the data technique can be seen in two stages which
includes the decision tree and cluster mining. The processing is done on the Weka decision
tree algorithm by the clarify tab on the selected tree J48 as shown in the below image,
Open the test option, select training data on the decision tree j48,
6
Document Page
Number of Leaves: 615
Size of the tree: 714
Time taken to build model: 1.98 seconds
=== Evaluation on training set ===
Time taken to test model on training data: 0.16 seconds
=== Summary ===
Correctly Classified Instances 22394 99.3346 %
Incorrectly Classified Instances 150 0.6654 %
Kappa statistic 0.9864
Mean absolute error 0.0105
Root mean squared error 0.0725
Relative absolute error 2.142 %
Root relative squared error 14.6356 %
Total Number of Instances 22544
=== Detailed Accuracy by Class ===
TP Rate FPRate Precision Recall F-Measure MCC ROC Area PRCArea Class
0.993 0.006 0.992 0.993 0.992 0.986 1.000 1.000 normal
7
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
0.994 0.007 0.995 0.994 0.994 0.986 1.000 1.000 anomaly
Weighted Avg. 0.993 0.007 0.993 0.993 0.993 0.986 1.000 1.000
=== Confusion Matrix ===
a b <-- classified as
9643 68 | a = normal
82 12751 | b = anomaly
The visualization of the decision tree is represented below.
Perform cluster data analysis technique, choose cluster tab, select the K means cluster algorithm
As shown in the below image , select the cluster mode and the class on K-means clusters,
8
Document Page
kMeans
======
Number of iterations: 17
Within cluster sum of squared errors: 53944.67210266422
Initial starting points (random):
Cluster 0:
8205,tcp,telnet,SF,0,15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,255,10,0.04,0.85,0,0,0,0,0.83,0
Cluster 1:
0,tcp,imap4,RSTO,0,138,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,0,0,1,1,0.5,1,1,255,37,0.15,0.03,0,0,0,0,0.4
2,1
=== Model and evaluation on training set ===
Clustered Instances
0 15292 (68%)
1 7252 (32%)
Class attribute: class
Classes to Clusters:
0 1 <-- assigned to cluster
9477 234 | normal
5815 7018 | anomaly
Cluster 0 <-- normal
Cluster 1 <-- anomaly
Incorrectly clustered instances: 6049.0 26.832 %
9
Document Page
Section 2 - Data Analytic for Network Intrusion Detection
1. Bench Mark Data
Convert Weka analysis on the intrusion detection for the bench mark is performed by uploadingthe
benchmark data analysis,.
2. Select the Features
Click the select attributes tab and choose the attribute evaluator as cfs subset eval and search
method as Best First on the instruction detection,
The output of the feature selections is illustrated below.
Search Method:
10
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Best first.
Start set: no attributes
Search direction: forward
Stale search after 5 node expansions
Total number of subsets evaluated: 490
Merit of best subset found: 0.435
Attribute Subset Evaluator (supervised, Class (nominal): 42 class):
CFS Subset Evaluator
Including locally predictive attributes
Selected attributes: 5,6,12,25,28,30,31,37,41: 9
src_bytes
dst_bytes
logged_in
serror_rate
srv_rerror_rate
diff_srv_rate
srv_diff_host_rate
dst_host_srv_diff_host_rate
dst_host_srv_rerror_rate
3. Create Testing and Training Samples
The test on analytics instruction detection [6] is given below,
Test on the analytics instruction detection is shown in the below image,
11
Document Page
4. Data analytic Techniques
The data analysis techniques followstwo methods that are, decision tree and cluster mining [7].
Decision Tree techniques
It evaluates the following output,
Correctly Classified Instances 22394 99.3346 %
Incorrectly Classified Instances 150 0.6654 %
Kappa statistic 0.9864
Mean absolute error 0.0105
Root mean squared error 0.0725
Relative absolute error 2.142 %
Root relative squared error 14.6356 %
Total Number of Instances 22544
Clustering technique
It evaluates the following output,
kMeans
======
Number of iterations: 17
Within cluster sum of squared errors: 53944.67210266422
Initial starting points (random):
Cluster 0:
8205,tcp,telnet,SF,0,15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,255,10,0.04,0.85,0,0,0,
0,0.83,0
Cluster 1:
0,tcp,imap4,RSTO,0,138,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,0,0,1,1,0.5,1,1,255,37,0.15,0.03,0,
0,0,0,0.42,1
12
chevron_up_icon
1 out of 15
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]