Data Analytics for Network Intrusion Detection

Verified

Added on  2023/03/31

|15
|1674
|452
AI Summary
This project aims to implement intrusion detection on a public data network using Weka evaluation. It analyzes the NSL-KDD dataset and evaluates the performance of various classification protocols for conflicting network traffic systems.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
University
Semester
Cyber Security and Analytics
Student ID
Student Name
Submission Date
1

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Table of Contents
Project Description................................................................................................................................3
Data set Description..............................................................................................................................3
Section 1 - Data Analytic Tools and Techniques...................................................................................3
Section 2 - Data Analytic for Network Intrusion Detection...................................................................9
1. Bench Mark Data.......................................................................................................................9
2. Select the Features...................................................................................................................10
3. Create Testing and Training Samples......................................................................................11
4. Data analytic Techniques.........................................................................................................12
5. Network Intrusion....................................................................................................................13
6. Performance of Intrusion Detection - Confusion Matrix..........................................................13
7. Limitation of over fitting.........................................................................................................14
8. Use of ensemble Tools.............................................................................................................14
9. Recommendation.....................................................................................................................14
10. Future Research Work.........................................................................................................14
References...........................................................................................................................................15
Project Description
The main aim of this project is to implement the intrusion detection on the public data
network using “Weka” evaluation. Analysis byWeka dataset for identifying the list of attacks
2
Document Page
that can follow on user -to -root, remote-to-local and probing on the selected NSL-KDD data
set which is analyzed and the performance of various classification protocolsfor the
conflicting network traffic systems by using weka data analytics techniques. Relationship
analysis for the available typical Network intrusion Stack is used by the intrusion to create
conflicting network traffic. The Analysis is done by using the data processing tool
WEKA.We shall investigate and study the facts about the bonding between algorithms and
network attacks. The investigating and the research report on the weka analysis will consist
of the three different stages on the network intrusion detection method. The Weka data
analytics will be used as the method on comparative analysis for detecting the network
intrusion of cyber security, and same shall be investigated.
Data set Description
The cyber security of the Weka analytics on the network intrusion detection of the NSL-
KDD dataset is specified in eight stages. A data set is recommended for solving some of the
inherent problems. The NSL -KDD dataset is used for comparing the various infiltration
detection methods for getting high computational accuracy. Finding the contradictions in the
network traffic is the key goal Developing mechanical learning algorithms and intrusion
detection systems of the weka mining tools NSL-KDD dataset file such as,
KDDTrain+.ARFF:
The dataset on the full NSL-KDD can be defined by the ARFF binary label format in
comparative analysis by using the Weka mining tools.
KDDTrain+.TXT:
The training set data can include the CSV format on the data set to identify the Attack type
label on the NSL-KDD dataset.
KDDTrain+_20Percent.ARFF
The analytic data on the subnet KDDTrain+.ARFF file is to be identified, which is the 20 %
on the mining tool for intrusion detection.
KDDTrain+_20Percent.TXT
Analysis data in KDDTrain on the compatible network of the file needs to be identified 20%
of the mining tool in the internal entry detection.
KDDTest+.ARFF
Full NSL-KDD test package including the attack type labels and problem levels are in CSV
Shape for the intrusion detection.
KDDTest+.TXT
Full NSL-KDD test package including the attack type labels and problem levels are in CSV
of the classification for the mining data analytics shape in the intrusion detection.
KDDTest-21.ARFF
3
Document Page
A subgroup KDDTest Entries are not included as the difficulty level of 21 out of 21 on the
NSL-KDD dataset on the intrusion detection on the mining data analytics.
KDDTest-21.TXT
Subset of KDDTest .txt this includes the file Difficulty level posts 21 of 21 on KDD test of
Intrusion detection.
Section 1 - Data Analytic Tools and Techniques
In this section, we have to install, analyse the data analytics site and demonstrate by
using at least two data analysis techniques. The steps include,
Select the data analysis platform dataset
Select the data mining and clustering technique
Select the knowledge flow of the analysis dataset in weka tool.
The initial stage in the process, first open weka as show below,
Upload NSL-KDD dataset, by selecting the Explorer as shown in the below image
4

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Once the data is uploaded, open the preprocessing tab and directly select the file accessing on
NSL KDD data , as shown in the below image [2],
5
Document Page
The network intrusion detection on the data technique can be seen in two stages which
includes the decision tree and cluster mining. The processing is done on the Weka decision
tree algorithm by the clarify tab on the selected tree J48 as shown in the below image,
Open the test option, select training data on the decision tree j48,
6
Document Page
Number of Leaves: 615
Size of the tree: 714
Time taken to build model: 1.98 seconds
=== Evaluation on training set ===
Time taken to test model on training data: 0.16 seconds
=== Summary ===
Correctly Classified Instances 22394 99.3346 %
Incorrectly Classified Instances 150 0.6654 %
Kappa statistic 0.9864
Mean absolute error 0.0105
Root mean squared error 0.0725
Relative absolute error 2.142 %
Root relative squared error 14.6356 %
Total Number of Instances 22544
=== Detailed Accuracy by Class ===
TP Rate FPRate Precision Recall F-Measure MCC ROC Area PRCArea Class
0.993 0.006 0.992 0.993 0.992 0.986 1.000 1.000 normal
7

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
0.994 0.007 0.995 0.994 0.994 0.986 1.000 1.000 anomaly
Weighted Avg. 0.993 0.007 0.993 0.993 0.993 0.986 1.000 1.000
=== Confusion Matrix ===
a b <-- classified as
9643 68 | a = normal
82 12751 | b = anomaly
The visualization of the decision tree is represented below.
Perform cluster data analysis technique, choose cluster tab, select the K means cluster algorithm
As shown in the below image , select the cluster mode and the class on K-means clusters,
8
Document Page
kMeans
======
Number of iterations: 17
Within cluster sum of squared errors: 53944.67210266422
Initial starting points (random):
Cluster 0:
8205,tcp,telnet,SF,0,15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,255,10,0.04,0.85,0,0,0,0,0.83,0
Cluster 1:
0,tcp,imap4,RSTO,0,138,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,0,0,1,1,0.5,1,1,255,37,0.15,0.03,0,0,0,0,0.4
2,1
=== Model and evaluation on training set ===
Clustered Instances
0 15292 (68%)
1 7252 (32%)
Class attribute: class
Classes to Clusters:
0 1 <-- assigned to cluster
9477 234 | normal
5815 7018 | anomaly
Cluster 0 <-- normal
Cluster 1 <-- anomaly
Incorrectly clustered instances: 6049.0 26.832 %
9
Document Page
Section 2 - Data Analytic for Network Intrusion Detection
1. Bench Mark Data
Convert Weka analysis on the intrusion detection for the bench mark is performed by uploadingthe
benchmark data analysis,.
2. Select the Features
Click the select attributes tab and choose the attribute evaluator as cfs subset eval and search
method as Best First on the instruction detection,
The output of the feature selections is illustrated below.
Search Method:
10

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Best first.
Start set: no attributes
Search direction: forward
Stale search after 5 node expansions
Total number of subsets evaluated: 490
Merit of best subset found: 0.435
Attribute Subset Evaluator (supervised, Class (nominal): 42 class):
CFS Subset Evaluator
Including locally predictive attributes
Selected attributes: 5,6,12,25,28,30,31,37,41: 9
src_bytes
dst_bytes
logged_in
serror_rate
srv_rerror_rate
diff_srv_rate
srv_diff_host_rate
dst_host_srv_diff_host_rate
dst_host_srv_rerror_rate
3. Create Testing and Training Samples
The test on analytics instruction detection [6] is given below,
Test on the analytics instruction detection is shown in the below image,
11
Document Page
4. Data analytic Techniques
The data analysis techniques followstwo methods that are, decision tree and cluster mining [7].
Decision Tree techniques
It evaluates the following output,
Correctly Classified Instances 22394 99.3346 %
Incorrectly Classified Instances 150 0.6654 %
Kappa statistic 0.9864
Mean absolute error 0.0105
Root mean squared error 0.0725
Relative absolute error 2.142 %
Root relative squared error 14.6356 %
Total Number of Instances 22544
Clustering technique
It evaluates the following output,
kMeans
======
Number of iterations: 17
Within cluster sum of squared errors: 53944.67210266422
Initial starting points (random):
Cluster 0:
8205,tcp,telnet,SF,0,15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,255,10,0.04,0.85,0,0,0,
0,0.83,0
Cluster 1:
0,tcp,imap4,RSTO,0,138,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,0,0,1,1,0.5,1,1,255,37,0.15,0.03,0,
0,0,0,0.42,1
12
Document Page
=== Model and evaluation on training set ===
Clustered Instances
0 15292 (68%)
1 7252 (32%)
5. Network Intrusion
The sample data is provided in the network infiltration. It has two types of network
penetration data, which includes normal navigation detection [a] and extraordinary infiltration
detection as [b].
6. Performance of Intrusion Detection - Confusion Matrix
Here, evaluate the performance intrusion detection for the two data analytic techniques that is,
decision tree and clustering.
In decision tree,
=== Confusion Matrix ===
a b <-- classified as
9643 68 | a = normal
82 12751 | b = anomaly
Based on decision tree, it has the
Correctly Classified Instances - 99.3346 %
Incorrectly Classified Instances - 0.6654 %
Total Number of Instances - 22544
In clustering,
Classes to Clusters:
0 1<-- assigned to cluster
9477 234 | normal
5815 7018 | anomaly
Cluster 0 <-- normal
Cluster 1 <-- anomaly
Incorrectly clustered instances: 6049.0 26.832 %
In clustering, it creates two clusters such as cluster 0 and cluster 1. The cluster 0 is normal intrusion
and cluster 1 is anomaly instruction. It has 26.832% of incorrectly clustered instances.
7. Limitation of over fitting
Top positioning refers to a model, which is very fine models with training data. Applying to a
sample of noise and details in training data, to negatively impact the performance of a new data
model. The problem is, after learning to remove some of the details it takes, the tree is to be adjusted.
The range of matching materials is listed below.
13

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
8. Use of ensemble Tools
The ensemble tools of algorithm are,
1. Bagging
2. Random Forest
3. AdaBoost
4. Voting
5. Stacking
9. Recommendation
Top positioning refers to a model, which is very fine model with training data. Applying to a
sample of noise and details in training data, to negatively impact the performance of a new data
model. The problem is, after learning to remove some of the details it takes, the tree is to be adjusted.
10. Future Research Work
In the future, the use of simple fives classification for network infiltration detection, will be
used for effective penetration for the provided database, as it is one of the most popular data analysis
techniques. It has the expected outcomes for the selected database.
References
14
Document Page
[1]"Comparing EM Clustering Algorithm with Density Based Clustering Algorithm Using
WEKA Tool", International Journal of Science and Research (IJSR), vol. 5, no. 7, pp.
1199-1201, 2016. Available: 10.21275/v5i7.art2016420.
[2]L. S.Katore and J. J.S.Umale, "Comparative Study of Recommendation Algorithms and
Systems using WEKA", International Journal of Computer Applications, vol. 110, no. 3,
pp. 14-17, 2015. Available: 10.5120/19295-0731.
15
1 out of 15
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]