Intrusion Detection on SCADA

Verified

Added on  2023/03/24

|39
|12934
|83
AI Summary
This report discusses the rise in security concerns due to the integration of SCADA systems with corporate networks and the internet. It introduces two datasets for SCADA intrusion detection system and provides a framework for training and testing algorithms. The report also covers the threats to SCADA systems and the importance of intrusion detection.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
Intrusion
Detection on
SCADA

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Abstract
This report will use a new SCADA system to monitor and control industrial control
systems in many industries as well as economic sectors. There is a rise in the security concerns
due to this new found connectivity. To researchers and industry this thesis makes one primary
contribution. For SCADA system two datasets have been introduced. This has been introduced to
support the intrusion detection system. The network traffic captured on gas pipeline is included
in the dataset. In order to train and test proposed algorithms, IDS researcher lack in a common
framework.
Document Page
Table of Contents
Abstract............................................................................................................................................2
CHAPTER 1: INTRODUCTION....................................................................................................1
1.1 Background............................................................................................................................1
1.2 Research Contributions..........................................................................................................3
1.3 Organisation...........................................................................................................................4
CHAPTER 2: LITERATURE REVIEW.........................................................................................6
2.1 SCADA System Threats........................................................................................................6
2.2 Intrusion Detection................................................................................................................7
2.3 SCADA Datasets and test beds............................................................................................11
CHAPTER 3: GAS PIPELINE DATASET..................................................................................13
3.1 Introduction..........................................................................................................................13
3.2 Previous work......................................................................................................................13
3.3 Gas pipeline system.............................................................................................................14
3.4 Dataset Collection Methodology.........................................................................................15
3.5 Dataset Description..............................................................................................................17
3.5.1 Raw Dataset......................................................................................................................17
3.5.2 ARFF dataset....................................................................................................................19
CONCLUSION..............................................................................................................................33
REFERENCES..............................................................................................................................34
Document Page
CHAPTER 1: INTRODUCTION
1.1 Background
The utilities that act very critical are being managed and controlled by the Supervisory
Control and Data Acquisitions (SCADA). There are various controlled systems involved which
includes the railroads, pipelines, power plants etc. Sometime before, these all systems were
excluded from the other various networks but now have been desegregated with the corporate
networks and the Internet. This integration with the various networks have maximized the
control of the organisations, thus savings have been also accounted up for the same. Various
security concerns are also needed to be analysed because of these new connections. The
susceptibility may exist in any of the system, if so, then it will permit the attackers to exploit the
data completely, thus having all the control over the SCADA systems. This control over can
cause breakdown in the hardware, thus harming the lives of the people.
The visualisation and control of the critical infrastructure systems have been controlled
by the SCADA systems. Four components have been used for the composition of these systems.
The first part contains the sensors and actuators, the second has programmable logic controllers
(PLCs), third is the supervisory control. Sensors are devices which collects the information about
a system. The state of the system is being controlled by the actuators which involves motor,
pump etc. PLCs has the responsibility of managing the collected data representing the state of
system. These controllers can also be considered as remote terminal units (RTUs). The master
terminal unit (MTU) interacts with these controls by managing and handling them. Various
protocols are there for carrying out whole communication such as Fieldbus, Profibus, Distributed
Network Protocol Version 3 (DNP3) and Modbus. Human machine interface (HMI) is
categorised as the final level. It is being used by an operator for representation of the MTU's
collected information. The role of an HMI is to manage the representation of the system along
with its sub systems. It also has another role of exchanging parameters within the SCADA
systems for continuing the interaction with the MTU. A simple SCADA system is represented
below:
1

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Figure 1Simple SCADA system
As per the requirements of the Corporate Network Interconnection and Security Aspects
of SCADA, these systems were developed to be vigorous, open and that can be easily used and
modified when necessary. It was unsure at that time whether these are secure enough or not.
There are three features which are missing from the structure of the system and these are lack of
authentication of the protocols that are being accessed by the SCADA systems, secured systems
through obscurity etc. It may also lead to imitation of the information and data that is being
received by the RTU and the MTU (Dell Security Annual Threat Report, 2015). It means the
people who are currently operating the specialised protocols and equipment think that no
outsider or any external body will be able to manage it in a way that they used to. The final
factor considers the notion that no trespasser can harm any of their system as it is wholly secure
physically. These features have made the infrastructure system endangered which requires
various cyber security protections.
Various researchers are monitoring the security features in the SCADA systems so that
they can exclude some of the frailness by giving some specific required solutions. Stuxnet, an
attack held at Iran in 2010, given the uranium rich plants by aiming the Siemens step 7 software.
This software is being used for programming the PLCs, which are considered as the digital
2
Document Page
devices handling the industrial systems. The windows environment has been introduced with it
and initiated their search for the Siemen's software. There was a play book named How Stuxnet
is rewriting the cyber terrorism play book, after the identification of the software, Stuxnet was
free enough to have the required data and putting the system in a critical phase. Re writing
firmware and the ladder logic made this possible on PLC. This, further permits the attacker to
forcefully produce false responses towards PLC.
The SCADA system has also been attacked by another attacker, Flame which was able to
collect the surveillance information. Flame is also similar to Stuxnet in a way that it infects all
the systems that are windows based on only distinguishable fact between the both is that the
Flame does not focuses on doing any harm, rather it focuses on collecting and streaming of the
data to the control server (Boyer and Stuart, 2014). After that, the filtration takes place and the
outcomes can be represented to the operator at the end. This specific attack was being used in
Iran for acquiring information of other states.
Aurora, another event by Idaho National laboratory, was being represented to the
government to discuss about the seriousness of these ongoing attacks. It was being experienced
on a temporary basis in which it duplicated the controls of the power system. The control system
was targeted first by the attack and also tried to include and exclude the circuit breakers. Because
of a minor change in the operation cycle, a fully damaged generator was the last and their final
goal which would have caused a fatal condition of the phase. Instead, it has not been imitated in
real but they were successful in grabbing the attention of government. Also, they were able to
increase the development in industrial control system (ICS).
An Intrusion detection system (IDS) can help in detecting and alerting the operators so
that they can prevent the system from further damages. IDS act as very essential part of
providing security features in any system that is communication based. It seems perfect to
manage and analyse the further conditions. In SCADA systems, these are being trained with data
logs which demonstrates the actual traffic. Any dataset which can modify and improve the IDS
system which is required.
1.2 Research Contributions
A primary contribution has been made to the industry and the researches. This
contribution involves two data sets which can be used for replacement of a previous one. The
3
Document Page
Gao data set was not suitable for the research of the IDS. In the Mississippi State University's in
house SCADA gas pipeline, Network transactions within MTU and the RTU is the data collected
in the data set. For replication of the real attacks and the activities of the operators on the
pipeline of the gas, various new data sets were collected with the help of a novel framework.
When compared with a previous data set, it was found that all the issues that affected were
resolved.
Features have been categorised in three different forms which includes payload
information, network data and labels. The network data provides a specific technique for
intruding the detection structure for competing against. SCADA systems have various network
topologies which are already decided and there are repetitive nodes as well. These systems do
not act like Information technology (IT) networks. It acts conductive with the IDS and is
conscious enough to detect any abnormal activity. Another category compiles of the payload
information. It provides the data about the pipeline state of the gas, parameters etc. These factors
of the system are enough for understanding the level of performance and also it will be able to
monitor if it is present in critical state as well.
For the assessment of performance of the SCADA system, data sets are suggested to be
used in the aid researches with the help of original patterns of the SCADA attacks and operations
of the HMI as well. These systems have a longer life line so it fixes the interactions patterns also.
Then, these data sets are permitted to be used for utilizing SCADA IDS structures by giving
some general characteristics.
1.3 Organisation
The next chapter covers the threatening areas of the SCADA systems, critical
infrastructure systems for the IDS along with an evaluation of the test beds and data sets of the
SCADA. It will be properly defined in this chapter that what is the importance of these data sets
and how this can prove purposeful to the people around. Third chapter, will be sufficiently able
to demonstrate the pipeline system of the gas which helped the data sets to be created adding up
the methodologies and framework which is to be implemented as well. Further two sections of
the third chapter describe the two data sets which will be created. It includes the raw network
transaction data and the information that is being collected from the first. Along with all this,
another section in the chapter comprises of the data set that has been improvised from the earlier
4

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
one. Then, at last there would be the last chapter compiling of the conclusions came from this
research.
5
Document Page
CHAPTER 2: LITERATURE REVIEW
2.1 SCADA System Threats
Using the SCADA network traffic, it has become easy and simple for the researchers to
study and develop IDS systems. These SCADA systems, day by day are becoming more unsafe
to the external parties and also have threatened the professionals as well. In the overview of
Security for Process Control, it has discussed the value of the industrial control systems by
including various challenges of the security of SCADA and also providing support and help to
have the outcomes of the challenges (Almalawi and et.al, 2014). Various types of security threats
are also included in the SCADA systems. In the Challenges and direction towards secure
communication, the issues regarding the security in the SCADA has been discussed along with
the smart grid technologies. It has been discussed in detail that these open standard protocols are
becoming endangered to various cyber-attacks. On isolated networks, these protocols have been
made, the security features have not been discussed there as they are not being connected to the
larger networks. According to Hong and Lee, there were some issues with the intrusion detection
systems also. For determining any abnormal activity in the system, it was asking for the various
network traffic patterns by which it can be able to find easily. There is a requirement of a data set
demonstrating the original SCADA system which must be including the peculiar traffic for
preparing the IDS that is being customized for applications of SCADA. As per Kang and et.al
(2009), it has involved various problems of the SCADA system. The table represented below
various attacks and the specific systems.
6
Document Page
Implementation of these attacks have been made so as to access the servers managing the
SCADA systems. After any of the server is being compromised, the attacker becomes able to use
the workstations operating the main process. According to Valentine and et.al., it discusses the
situations that can occur at the time of compromised. It also covers the fact that the ladder logic
of the PLCs was unsuccessful in providing the protective features for various errors. Various
intentional as well as unintentional errors have been discussed in the application level. The
outcomes represented the need of validation and verification of the specific tools for providing
another protection layer to the PLCs. Similar to Hong and Lee, Dzung and et.al., 2005, there
were huge amount of problems in the communication networks for the industrial applications.
Also, a proper list has been provided which were quite common to the application domain. With
the help of various conventional and emerging technologies, it is possible enough to protect the
control systems industrially. A very common recommendation of it can be the intrusion detection
systems. These types of systems act essential for giving the real-time data of any normal or
abnormal activity. The intrusion detection systems will be discussed in the next chapter below.
2.2 Intrusion Detection
These systems are used for analysing and collecting the system activity data so that it will
be able to monitor status of a system. It also helps in a proper examination of the state of system
and in performing the integrity checks on different files in a system. Various machine learning
algorithms have been used by many IDS in order to control and detect the threats that seems
abnormal for a certain system. Signature based systems have also been used by many IDS so that
they can be able to compare the activities and processes about the known threats. For a perfect
7

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
detection system, all these features can be combined together and it will also help in providing an
efficient layer of protection from various attacks.
There are three components of an IDS and it includes the Network Intrusion Detection
systems (NIDS). It uses a signature based system for the determination of the various processes
and activities in the system that is everything within normal factors or it can be found in the
known attack's database. The activity is being reported to the administrator or operator in the
case where the NIDS has found a match of a signature. The NIDS helps in providing a warning.
Rather it does not prevent it from the on-going traffic ( Sugwon and Myongho, 2010). Another
component is considered as the Network node intrusion detection system (NNIDS) which seems
more effective for the interaction within a single bus system and a control station. Although, it is
quite similar to the NIDS, but it also gives analysis of the behaviour along with the pattern
recognition. An algorithm is necessary for each control set to work according to its specific
functions. There is a need for specialisation so that higher level of data security can be provided
to the systems. The sub system has the final copy of the security and the process is called as Host
Intrusion Detection System (HIDS). It has the responsibility to analyse the actual state of the
system, thus also can have various integrity checks on it. It helps them in determining whether
there is an abnormal activity or not that may affect the whole process. There has been seen a
lower variability in different states but the changes in the processes can be easily detected with
the help of an installed IDS.
As discussed, there have been a huge amount of limitations for the solutions of the IDS. It
can be false positive rate from the noise that has been produced in the general activities. It has
been introduced in the form of a packet or like a malfunctioning hardware. It can be further
analysed as quite abnormal and also reported as a threat to the available operator. These huge
amounts of false reports can have a negative impact on the overall effectiveness of the IDS. This
is because it can erase the threats, causing the ignorance of the warnings. There are various
problems with the IDS solutions but a very common is the regular need of upgrading the
signatures. It has been a major issue with the IDS solutions. Updating of every system is
essential but sometimes can be overlooked by the operators which can also cause some
vulnerable changes to the whole system. Another limitation of this can be considered as the fact
that any system with poor authentication and un identified protocols, cannot be secured by the
IDS. The reason behind it can be spoofing. It only creates problems where a system is recording
8
Document Page
pressure data. The above example can be compared with the Aurora attack which was developed
for a power system and made it a tough situation for the generator. In that attack, it was possible
enough to place the generator in about phase state. It could be proved very harmful to the system
as well. Out of huge limitations, the final one was with the analysis of the encrypted traffic. In
this, Inserting the packets deeply in an IDS are not possible. The traffic should be encrypted
before the occurrence of any attempt. There are issues in the processing times and also can affect
the IDS for performing in a real time. Instead of the fact that IDS consist of various limitations, it
still has an important role in providing proper security to the networks.
In computer networks and various virus software's, IDS are being commonly used. For
providing security to the personal computers and the web servers, these systems have an
essential role. Also, they are considering the field of research for ICS professionals. In SCADA
systems, there are various reasons for implementing IDS. A very common example of this can
be the dependence of highly critical structure on the specialised protocols which have been made
in accordance of the ease and reliability (Introduction to Industrial Control Networks, 2012). It
gives priority to these factors rather than giving to the security. These systems depend on the
operators and also requires automated approaches for proper monitoring the general activities of
the system. Various studies have been there to provide a much wider approach which are
sufficient for improving the security features has easily distributed to the system.
IDS have developed an approach that is to be implemented on the SCADA systems and it
was represented in an unsupervised anomaly based detections approach for integrity attacks on
the SCADA systems. It was proposed that a learning that is not a supervised algorithm, would
act best on the SCADA network. the theory has also been tested with the help of data and
information from a real or industrial system. It could be a water plant. There were many pre-
processing techniques which were used as the pre-processing techniques and they were
managing the input so that the improved results can come as an outcome. It also controlled the
noise factors that was within the data sets from the water plants. Then, it also has been proposed
that the data sets through the clustering algorithm was a behavioural analysis technique. It was
concluded that this this type of behavioural approach that represents the promising behaviour and
the ability to accomplish high detection results in the specific field. With the help of fixed width
algorithm, it was easy to achieve the maximum of the detection rate of 90% with a false
negativity of .01%. It is concern with a few factors such as the complexity and time of algorithm
9
Document Page
along with the contained data set. A very common example of this can be the automated
approach which is being highlighted in the “Improving Security for SCADA sensor networks
with reputation systems and self-organised maps. According to Moya, unsupervised learning
algorithms were easy to be used as it accounts the severe processing powers which are required
to apply these techniques. It was important to train the algorithm along with a data set of normal
activities and this can be normal as well as abnormal activities.
When spoofing is implicated, the traffic identical to the normal is a very general problem
because it initiates various legitimate requests that seems similar to the format of the normal
traffic. To set various spoofed packets, it is the responsibility of the quantization errors in the
anomalous clusters.
In many real-life systems there are some products that are used already and the novelty
approach is not used above. The signature databases or rules that are produced by operator are
required as solutions. The one of these products example is Snort IDS. The real time SCADA
network traffic is analysed and logged with the use of Snort IDS. It is a type of NDS. The
network packets can be examined with help of Snort and deep inspection of packets can also be
done. The information within payload of packet can be explored with help of Snort. Such type of
products is defined by rule set that is made by operator of a system. The rules in rule set can
either be created by the profession of that filed or by database of signature. Among source NIDS
systems snort is gold standards. As it has over 100,000 users and millions of people are
downloading this product. Many companies can install such type of products as it is free and
within their systems it will increase the number of security layers (Simply Modbus, 2015). For
known attacks such type of IDS is efficient. The disadvantage of this IDS is that it is not able to
detect attacks that are similar to normal traffic.
The other product that can be used for IDS can be Bro. Rather than using Bro in research it is
used commonly in commercial systems. With almost every computer-based communication
protocol bro can be adjusted to work. For the DNP-3 protocol the bro can be adapted into
SCADA so as to build a specific -based IDS for this protocol. As stated by H. Lin and et.al.,
2013, in a SCADA system that make use of DNP3 protocol it is possible to use the other. In
SCADA type systems the commonly used communication protocol is DNP3.The detection that is
sued by Bro is similar to Snort. But there is on difference that Bro uses known signature of
attacks in comparison with Snort as in Snort rule set is used. In the Scada traffic Bro is used by
10

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Lin in order to identify semantics and to give validation to DNP3 protocol. The detection on
attacks such as replay previous attacks for denial of service. For attacks that cause not stable
system states protection is provided. For creating cyclic redundancy check errors, the protection
is also given by this ID. Just by observing patterns the details of service can also be detected. The
knowledge about the system need to require to identify unstable state attacks. The database for
signatures is required for both Bro and Snort. In order to understand the difference between
normal and anomalous behaviour the machine learning algorithms were used by Almaawai and
et.al., 2014. These learning algorithms do not require database. The training against a dataset can
be given to MLA with the help of these approaches. With this, clusters of data will automatically
categorize into categories. The completely independent dataset can be used by these both
approaches in order to test methods. The datasets will be discussed in the next section. And test
beds that are available for researchers to use and need of proposed dataset will also be
discussed.
2.3 SCADA Datasets and test beds
In order to analyse the performance of IDS SCADA datasets and its beds can be used. To
train and test proposed algorithms the researchers of IDS lack a common framework. Due to this
the research progress of proposed IDS was limited and also comparison of proposed IDS was
also not done properly. All types of attacks are not contained in many of the datasets that are
used by researchers. If all the patterns of attacks are not considered then gauging the
performance of the IDS becomes difficult. In order to test performance of their IDS the
Almawali and Moya make use distinct dataset. The water treat, met plant was used as dataset in
the research that was conducted by Alamwali. The dataset that was used by Alamwali was from
real-world but fault was not stimulated against a system that provide service across globe. This
dataset was also not able to detect the attacks. To gauge the effectiveness of his IDS these
unknowns made it hard. The in-depth information about data that was used by Moya in his
research was not discussed. The dataset that was used by him was from simulated sensor network
and the patterns of attack was contained within it. By using their individual dataset their number
of several researchers that provide IDS to people. The test bed was described by Mahmood and
et al (2009) in developing a SCADA Security testbed. In this simulation to real SCADA system
was provided and the multiple real system of world is connected by using this test bed. With help
of this test bed the researchers can run attacks against his models and IDS can be tested against
11
Document Page
attacks. As per Hall and et.al., (2009), The limitation off using test bed is that dataset from real
world is not provided by it. The model -based intrusion detection for networks as made by
Cehung and et al., 2007 that can be used by researchers to collect dataset on SCADA test bed
that is situated at Sandia laboratories. The reconnaissance type of attack on TCP protocols is
contained in this type of dataset. Against categories of attacks such as denial of service and
injection their IDS are not tested. The dataset that was used by Yang and et al (2006) in
anomaly-based intrusion detection was collected inn lab's simulation of SCADA system. Both
DoS and injection type of attacks was present in the dataset but reconnaissance attacks were not
included in it. In order to give a validation of third-party of IDS solution a common data set is
ended. To fill the void in this filed the dataset from research has been made. In the next chanter
the dataset that is created will be described in detail.
12
Document Page
CHAPTER 3: GAS PIPELINE DATASET
3.1 Introduction
In order to test effectiveness of IDS in 1999 DARPA dataset was made by Lincoln lab. The main
objective of researcher behind this production of dataset was to test viability of IDS. For
evaluating computer network IDS this dataset has played an important role and for the researcher
a benchmark was provided to validate the outcomes. From a simulated sir force base network
this dataset was collected. This air force was connected to internet while collecting dataset. In the
form of a tcp dump the network traffic was contained in simulated network produced
seaworthiness the dataset the various information such as Sun BSM data, information related to
file, sniffed network packets were included (Moya and et.al, 2009).
According to a thesis, the various attacks that were carried out are explained thoroughly.
The Data attacks, User to Remote, Denial of service, remote to local and probe are five groups in
which attacks can be categorized. To extract files the data attacks were used. The security policy
of data attack stated that on the host computer the file should be kept. The confidential or secret
files can leave the computer when they are being accessed by authenticate client. The user to
remote attacks gives the privilege to users to communicate in distant position. The access to
victim's machine is gained to attacker with the help of remote to local attack. The attacker can
transmit the modified file or extract data from machine of victims. In order to disrupt
transmission of data from network machines the denial of service attacks is designed. The
probing attacks are caused by DARPA dataset. The specific system information can be attacked
by this attack. The IP address of local machines, local operating system etc information can be
attacked by using DoS (Gao and et.al., 2010). The IDSs tailored towards these applications are
been created by researchers within dataset by including attacks and normal activity. For SCADA
system also the same thing is followed. But currently three is no such dataset is there that can be
accessed or used by all researchers.
3.2 Previous work
The proposed dataset that is created for this research is 2nd repetition of previous dataset.
This dataset belongs to a gas pipeline system which was to fill void in research of IDS for
application of SCADA. Wei Goa was a person who created the first iteration of the dataset
obvious patterns was found in the dataset of Gao. The extremely high detection rates was found
in the dataset as it contained obvious patterns. In order to defined out that algorithms of machine
13

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
learning can be used them as per by Jones (2012) was written. He wanted to determine whether
these algorithms can be used in anomaly detection in SCADA systems. He also wanted to test
the effectiveness’s of these algorithms by testing the with dataset of Gao. In comparison with
Gao dataset, the dataset of Thornton contained a lot of problems, For IDS research the dataset
was not a suitable choice. Due to correlations between specific parameters and outcomes
predicted by algorithms this dataset was not suitable. In real SCADA transactions these
correlations are unrealistic and dataset is rendered unsuitable in current form.
By being placed into three distinct configurations the unrealistic transactions were caused
by system. A new process was created to help the dataset to get rid of this obvious pattern. In this
new process all possible state configuration was created to place the system and also the normal
operation of gas pipeline was represented. In order to reduce the chance of unintended patterns
the states were selected randomly. The invariable attacks that run against systems also caused
obvious patterns within dataset (Lin., and et.al, 2013). The dynamically changing patterns were
not contained in the attacks as attacks were static. By parametrizing and randomizing order this
problem was addressed by new process in which attacks were executed. In conjunction with
existing attacks the new attacks were also created. In creation of these dataset the gas pipeline
system was used which is discussed below-:
3.3 Gas pipeline system
By Mississippi State University SCADA lab this gas pipeline system was provided. The
datasets are collected by using this system. The three major components that are used by the
system are sensors and actuator, a communication network and supervisory control.
Along with a pressure sensor at the lowest level the two actuators are contained in two
actuators. In order to control the physical process of system the actuators, a pump and solenoid is
used. They are used to maintain pressure hat is set by supervisory control. The three main system
models are there in gas pipeline. They are automatic, manual off. There are two schemes to
maintain the pressure when the system is in automatic mode. The maintenance of pressure is
decided by supervisory control. The pump is turned on and off in the first scheme of pump mode.
This is done in order to keep pressure of pipe at set point. In order to simulate a constant load on
the system this scheme was created. The solenoid mode is the second scheme. In this to regulate
pressure the relief value controlled by solenoid is opened and closed. The proportional-integral -
14
Document Page
Derivative(PID)control scheme is used in both pump and solenoid modes. The manual model can
also be there in system mode which guides the operator to control pump and solenoid manually.
The protocol that is used for communication network is serial Modbus RTY. A header
and payload is included in Modbus packets. Over a serial line for Modbus a device address,
payload and cyclic redundancy code(CRC) is included in the packet. The Modbus Application
Protocol (MBAP) header, code of function is included in modus packets. The transaction
identifier, length and device identifier is included in MBAP header (Cherry, 2010). Over serial
line addresses the device identifier is similar to Modbus. The data sets that are used in this work
are taken from Modbus. For the Modbus/TCP data they can be used safely as a proxy. This is an
exception as there is no length field and protocol identifier. A visual representation of Modus
TCP and RTU packed is shown below-:
A count of transaction numbers can be done with transaction identifier. For legal Mod
bus /TCP packets the protocol identifier is always 0. The number of bytes in payload is length
and for function code it is plus 1.
Packets are identical inside the payload /TCP and Mod bus over serial line packets (Pires
and Oliveira, 2006). The most common types of command are modus read and write commands.
The additional packet attributes are included in the read and write payloads. These can be coil or
resister address, error codes, and exception codes etc. The diagnostic, file records access is some
of the exceptional commands that include the sub function codes and in order to describe
particular queries and responses they have attributes.
Supervisory controls are the also component in the gas pipeline. The MTU and iFIX HMI
are included in this. In one to many configurations the MTU is set up. This means that from the
one MTU all slave devices receive their controls. And the respond is send back to the MTU from
many RTU. To the MTU HMI connects and for a human operator interface is provided that will
monitor the system and when needed supervisory control is provided.
The process by which the collection of dataset took lace and the detailed description of
dataset will be provided in the next section. In order to represent unintentional trends that have
been eradicated from dataset this chapter will include discussion.
3.4 Dataset Collection Methodology
In order to create the dataset a new method of providing stimulant and collecting data log
was used. To improve the dataset the first step is that order need to be parametrize and
15
Document Page
randomize in which the execution of attacks takes place. By taking all attacks, the execution was
done. In a man-in-middle functions he attacks were implemented. The all types of attacks were
included in the man-in-the middle method.
The interceptor is type of attack that is sent to both initial receiver and attackers. The
information about each other's node can be gained in this type of attacks. The attackers will
attack the brand and models of the RTU that is used by the system in order to gain information.
The all communication between two nodes in a system is blocked by interruption attack.
Between MTU and RTU slave device in the gas pipeline this type attack would cause denial of
service. The parameters can be modified by the attacker in the modification attacks. The set point
parameters can be modified by the attackers in terms of gas pipeline. In a fabrication attack a
new packet is created that is sent between MTU and RTU (Brundle and Naedele, 2008). In the
gas pipeline dataset, the attacks fall into these categories further they can be divided. In the table
given below the categories of attacks are shown.
By establishing ranges in which each attack operated the parametrization was accomplished. A
coverage of all possible attacks is provided by these ranges. In a specific manner these ranges
can be executes. For instance, the set point parameter can be modified with set point
manipulation attack that coordinates the level of pressure in gas pipeline. Once the accomplished
of parametrization of each attack then algorithm is designed in order to execute attacks in
random order.
The intent of algorithm is to execute attacks in equal number of times and to minimize
the unintended patterns that are discovered in the first iteration of dataset. This does not mane
that similar packets will be crested or changes for all packets. As few packets are required for
some attacks to execute however some attacks may need large packets to execute. For an
instance All function codes are scanned by the function code scan attack that is there in Modbus
framework. For this attack the number of packages will be higher. After the implementation of
randomization of track patterns, the normal states are also randomized. An auto script was coded
to accomplish the randomization of normal states. The direst interaction with iFIX HMI was
possible due to this. The automatic interaction with GUI is possible with Auto IT as it is a
windows scripting language. The movements of mouse and keyboards can be stimulated with the
help of HMI. Regarding gas pipeline the controls of HMI displays information. The visual
representation of currents states and operation of gas pipeline can be provided. An operator
16

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
changing state of system and parameters of PID can be simulated by Auto IT script. The physical
constraint is there during testing of a system that prevent pump from turning on constantly. So,
the cool down time of twenty minutes need to be allowed for the pump and seven minutes
running time. At a 25.9% duty cycle the script in turn runs the system.
The packets that are received by either the MTU and RTU are implemented by a data
logger. On the man -in-middle PC the data logger sits on. Through the use of c file input and
output the data logger directly integrate into attack framework.
3.5 Dataset Description
In the two forms the datasets are provided from this work. The comma separated value(CSV)text
file is the first form. The attribute relationship file format (ARFF)is the second form. The dataset
ARFF was created to use in WEKA. The WEKA stands for Waikato Environment for knowledge
analysis. A comprehensive list of machines for knowledge algorithm is proved by WEKA tool.
By many researchers the WEKA tool is used in fields of IDS in order to rest performance for
specific algorithms. The dataset that is provided to organisation shows that MTU and RTU the
packets are being delivered. Along with information of payload each instance in dataset contains
information of network traffic. For intrusion detection system the network information provides
pattern of communication (Weiss, 2014). The network topologies at SCADA system are fixed
and repetitive and regular transaction between node is there. To detect anomalous activity this
static behaviour is conductive to IDSs. The payload in formation is the second category of
features. The information regarding the state of pipelines, setting and parameters is provided in
the payload in function. It is vital to understand these values as how the system will perform if
the system is in critical state. In each dataset there are total 274,647 instances. The multiple
columns that are contained in each row are known as features. The further discussion about these
features will done below. The consequences are faced due to presenting each Modbus frame as
row I the dataset is that same information is contained in all frames and for some instances many
features are not known.
3.5.1 Raw Dataset
In this raw unprocessed data is given. The raw network traffic data is contained in the dataset.
The objective of providing raw data to give a way of authenticate legitimacy of the pre-processed
ARFF dataset. And their own specialized methods will allow the research to pre-process. In the
raw set for each instance there are six features. The Modbus frame is the first feature. It is
17
Document Page
received by the slave device or either by master device (Valentine and Farkas, 2011). All
information forms the network, state and gas pipelines is contained in the Modbus frame. By
determining the function code, the frame can be processed. In appendix A the system is utilizing
and using the memory mapping. The register values for both master and slave side PLCs is
shown in the diagram in Appendix A. The information that resides in each register is also
provided by memory mapping. The state information, set point, PID parameters etc information
is provided. For each register on PLC the frame can be pre-processed into 4 different features.
An example of Modbus frame to write command from MTU to RTU is shown in the below
diagram.
The register 40002 is the register in which the write command is written. The read and
write values for register begins at 4000 in the Modbus protocol. For the complete gas pipeline
system, the state and parameters information is contained in the registers. Some features that
have been extracted from the locations of these registers is provided by ARFF dataset.
The category of attack and particular attack that take place is represented in the second
and their feature in a raw dataset row. In tables 3.5,3.6, 3.7 and 3.8 the specific category values
are represented. The major category is the second feature (Table 3.5) and the specific task is the
third feature (Tables 3.6,3.7 and 3.8). The zero will be reported by both these features in case of
a normal operation Modbus frame. It is important to train a supervised learning algorithm for
both these features. This will help algorithms to learn attack patterns behaviours. A one to one
representation will have provided about the label will be provided in later in the diagram. The
diagram alter will also represent the categorises of description and specific attacks.
The source and destination of the frame is presented by the fourth and fifth features in a
raw dataset row. For the feature of source and destination there are only 3 possible values. The
master device which is sent to packet is represented by value '1' and man-in-middle that is sent to
packet is represented by value '2', the slave device that is sent to packet is represented by value
'3’. The main goals of this area is to provide a label that will help to explain the origin of packet
and will also help in raw dataset pre-processing. The time stamp is contained in the raw dataset
as last feature. The time-interval can be calculated with the use of time stamp. This can also help
in IDS. There is only marginal change in the time interval during normal sub operation a large
change in time-interval can be there due to malicious command injection or changes
18
Document Page
3.5.2 ARFF dataset
To be used with WEKS the ARFF dataset was created. The twenty features are contained in it.
Some of its features are similar to the raw dataset. All twenty features are shown in the table give
below-:
Table 3.2 Feature List
Feature
Address Scheme control
Function pump
Length solenoid
Set point Measurement of suppress
Gain CRC rate
Reset rate Command response
Deadband time
Cycle time Binary outcome
Rate Categorized result
System mode Specific result
The station address of the device slave is contained in the first feature. To each master
and slave device a unique eight-bit value is assigned. The unique eight-bit value is the station
address. Anderson (2001) says the slave can be identified by the address. It will also recognize
the master that is sending commands and the slave which is responding. The configuration is
done of Modbus protocol so that all master transactions are received by slave devices. In order to
find out whether message is for itself or for other save devices the station address need to be
checked by slave device. In order to enhance the detection of device scan attacks this feature can
be used. To all possible station addressed it will broadcast commands in order to discover the
operations of the address. The function code is the second feature. In the gas pipeline the
function code is used primary to read (0x03) and write commands (0x16). There are about 256
different function code that exists. For malicious functions some of the function codes can be
19

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
sued. For an instance function code '0x08’. For diagnosticians purposes the function code '0x08'
is used. In a listen only, mode it can used to force a slave device. Using a valid function code an
attack like this would cause denial of service. In order to detect function codes that are not
ordinary IDSs can use this feature. The Modbus frame length is the third feature. For every
command or response query the length of Modbus frame is fixed it does not change. A set of rad
and write commands are used repeatedly in the gas pipeline system. They ate sued to conduct
block reads and block writes from partial registers. The frames that are not a part of particular
length are detected easily during detection of attacks. They are detected as anomalous.
The set-point values are contained in the fourth feature. It is used in the gas pipeline to
control the pressure. When the gas pipeline modes are set to automatic the set point is utilized. In
order to maintain the value of set points the attempts are made by slave ladder logic. The values
of set point are provided by turning the pump on or off or by opening the solenoid valve. The
physical system is affected drastically by the set point feature (Meserve, 2007). And for an
attacker the set point feature will have a common point of malicious intent. The PID controller
values are represented by the next five features. In order to tune PID controller gain, reset rate,
dead band, cycle time are values which are used (Carr, 2014). Based on these five parameters an
error is calculated. The relief value can be opened and closed by PID controller. And in order to
minimize error the pump can be turned on or off.
The control to the system's duty cycle is provided by tenth feature. Only three possible
values are there which are valid and they are shown below in the table.
Table 3.3 System mode features
System mode feature
0 off
1 Manual
2 Automatic
In a duty cycle the gas pipeline is configured to have a 25.9%, unless the system is active the
mode of system feature is set to '0’. The control scheme feature is the eleventh feature in the
dataset. In the gas pipeline the control scheme determines that system is controlled by solenoid
or by pump. The solenoid will remain open if he controls scheme is set to pump '0’. And at the
20
Document Page
set point the pump is cycled to maintain pressure of gas. Against the opened solenoid the pump
will continue to pump. By this the load will be stimulated in the real gas pipeline. The pump is
on constantly if pump is set to solenoid '1’. By opening and closing a solenoid the pressure is
controlled and also it will allow pressure to leak.
Only if the mode of system is set to manual the twelfth feature will control the state of
pump. Only values that is '0' and '1' can be provided by this feature. If an attacker was not able to
modify the system mode to manuals then the system will be put into a critical state. The system
will be in critical also if the pump is turned on. The system can be over pressurizing due to this
type of attacks. A serious physical damage can also be caused. When the system is in manual
mode the thirteenth feature will control the states of the solenoid. The two possibilities are there
for this feature. When it is closed ' 0' and when opened it is '1’. A serious damage to the system is
caused by similar attacks. The current measurement from the gas pipeline is contained in the
fourteenth feature. By a pressure gauge which is connected to the pipeline the measurement is
provided and in a register the data is stored. By the master device the register can be read and on
the HMI, it is displayed (Understanding Intrusion Detection, 2001). In many attacks this feature
can be used. In order to imitate behaviour a false measurement can be provided that is not taking
place in the system. The cyclic redundancy check(CRC) is contained in the fifteenth features.
The errors can be checked with the help of CRC. By the master or the slave device this check is
provided. A bad CRC that can cause attack such as DoS can be transmitted by an attacker
constantly. The CRC does not exist in the Modbus-TCP. By the TCP-frame the CRC is provided.
In order to learn differences between commands and responses the sixteenth features is provided
that will help IDS in it. For response the value is '0' and for command the value is '1'.From the
Modbus frame this information is not passerine the pre-processing step it is provided .In the raw
dataset the features such as category attack, binary attack etc were also provided. All the details
with their type are provided in table 3.5-3.6 in details.
Table 3.4 Feature list
Feature Type
Address Network
Function Command payload
Length network
21
Document Page
Set point Command payload
Gain Command payload
Reset rate Command payload
Deadband Command payload
Cycle time Command payload
Rate Command payload
Control scheme Command payload
Pump Command payload
Solenoid Command payload
Measurement of pressure Response payload
Crc rate network
Command response network
Time network
Binary time Label
Categorized attack Label
Specific attack Label
The tremendous worry is caused by flame, Stuxnet and aurora as it was disused in the
introduction. And due to this the question for security of current SCAD system was challenged.
The security challenges that is faced by SCADA system is shown by cyber threats and
vulnerability. In chapter 2 many different attacks vectors are disused by researchers and the
challenges related to security that is faced by SCADA system is also disused. The several
categories of attacks such command injection, denial of service etc on SCADA protocols is
disused by researchers in these papers. All angles of attack can be studies as the protocols are
open standards and also solution of security is provided. Many of these attacks need to be
executed against the system so as to provide dataset for SCADA IDS. In Gao research the attacks
22

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
that were used were found. Seven categories of attacks are developed by Gao. The table given
below illustrates this-:
Table 3.5 Categories of attacks [7]
Type of attacks Abbreviation
Normal Normal (0)
Naive malicious response injection NMRI (1)
Complex malicious response injection CMRI(2)
Malicious parameter command injection MPCI(3)
Malicious function code injection MFCI(4)
Denial of service DoS(5)
Reconnaissance Recon(6)
Malicious state command injection MSCI(7)
In the four categories the seven categories of attacks are split. These are command
injection, reconnaissance, denial of service and response injection. In the Gao's work the
description of the attacks was found. In this worls al attacks have been modified slightly but the
behaviours are same. The MSCI, MPCI, MFCI is contained in command injection attacks. Two
types of behaviour are provided by the response injection attacks. The first one is NMRI which
cause out of bond behaviour. When the attacker attacks information about the physical process
then this type of attack that place. The second is CMRI. The state is leveraged by this attack.
Attacks has been categorised in seven forms which are represented in the table below:
Table 3.1 Attack Categorization
Type of Attacks Abbreviation
Normal Normal (0)
Naïve Malicious Response Injection NMRI (1)
Complex Malicious Response Injection CMRI (2)
Malicious State Command Injection MSCI (3)
23
Document Page
Malicious Parameter Command Injection MPCI (4)
Malicious Function Code Injection MFCI (5)
Denial of Service DoS (6)
Reconnaissance Recon (7)
All these seven categories have been divided in four parts which includes the command
injection, denial of service (DoS), reconnaissance and the command injection. In GaO's work,
there has been noticed the consideration of the attacks and are further described below. All these
attacks are of the same way in terms of their behaviour but also includes some malicious state.
There are some components that are included in the command injection attacks which are
malicious parameter command injection (MPCI), malicious state command injection (MSCI) and
the malicious function code injection attacks (MFCI). Two types of behaviours have been
provided by the response injections. The first provided is considered as the naïve malicious
response injection (NMRI) whose behaviour is quite out of the bond and it is usually not
available in the general operations (Cheung and et al, 2007). These types of attacks usually
happen when the malicious attacker has less information and data regarding the process of
physical system (Cryptography and Security in Computing, 2012). Another type of response
injection is categorised as the complex malicious response injection (CMRI). The impact of all
these attacks makes the physical and state process information slower for the designing of the
attacks whose behaviour is quite normal.
Table 3.6 Cyber Attacks 1-12
Attack Name Number Type Description
Setpoint Attacks 1-2 MPCI Changes the pressure set point outside and
inside of the range of normal operation.
PID Gain Attacks 3-4 MPCI Changes the gain outside and inside of the
range of normal operation.
PID Reset Rate
Attacks
5-6 MPCI Changes the reset rate outside and inside of the
range of normal operation.
PID Rate Attacks 7-8 MPCI Changes the rate outside and inside of the range
24
Document Page
of normal operation.
PID Deadband
Attacks
9-10 MPCI Changes the dead band outside and inside of the
range of normal operation.
PID Cycle Time
Attacks
11-12 MPCI Changes the cycle time outside and inside of the
range of normal operation.
Another form of attacks is categorised as the reconnaissance attacks. These are specially
structured for the collection of information regarding the system with the help of passive
collection. Also, by means of forcing the data through a specific device. The data and
information involves a huge information of the network such as the length, CRC, address etc.
Also, it includes various features of the device such as the communication protocols, model
number, function codes and the manufacturer.
A sense of sophistication can be achieved by the CMRI attacks when compared with the
NMRI attacks. These duplicates some specific behaviours that happen in the normal bounds. The
states that are injected are being leveraged which makes the system to lose the efficiency and
sometimes, can also make a loss of the money and the product. With the help of these attacks, it
seems easy to hide the changes of the state which can happen in the command injection attacks.
As states have been injected by these attacks which represents normal operations, it becomes
even more tough to be detected.
Table 3.7 Cyber Attacks 13-23
Attack Name Number Type Description
Pump Attack 13 MSCI Randomly changes the state of the pump
Solenoid Attack 14 MSCI Randomly changes the state of the solenoid.
System Mode
Attack
15 MSCI Randomly changes the system mode.
Critical Condition
Attacks
16-17 MSCI Places the system in a Critical Condition. This
condition is not included in normal activity
Bad CRC Attack 18 DoS Sends Modbus packets with incorrect CRC
25

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
values. This can cause denial of service
Clean Registers
Attack
19 MFCI Cleans registers in the slave device
Device Scan Attack 20 Recon Scan for all possible devices controlled by the
master
Force Listen Attack 21 MFCI Forces the slave to only listen
Restart Attack 22 MFCI Restart communication on the device
Read Id Attack 23 Recon Read ID of slave device. The data about the
device is not recorded, but is performed as if it
were being recorded
The configuration commands are being managed by the MPCI, MSCI and the MFCI. It is
so because to involve the modifications in the state and behaviour of the system. There is a wide
range of command injection attacks which can cause various factors such as the interruption in
the communications of the device, unauthorised authorization of the set points of the process and
the configurations of the device as well. To improve the current state of the physical process,
MSCI attacks have been used. These can take the system in a critical phase which can be harmful
for the system and for the lives of the operations as well. The parameters have been modified by
the MPCI attacks which helps in the determination of the set point and PID configurations. The
commands that are injected by the MFCI attacks destroys the protocol commands of the network
for the changes in the behaviour of the network. Attacks of Denial of service (DoS) tries to
interfere the interactions between the wireless networks and the network protocols as well.
Table 3.8 Cyber-attacks 24-35
Attack Name Number Type Description
Function Code Scan
Attack
24 Recon Scans for possible functions that are being used on
the system. The data about the device is not
recorded, but is performed as if it were being
recorded.
Rise/Fall Attacks 25-26 CMRI Sends back pressure readings which create trends
26
Document Page
on the pressure reading’s graph.
Slope Attacks 27-28 CMRI Randomly increases/decreases pressure reading by
a random slope
Random Value
Attacks
29-31 NMRI Random pressure measurements are sent to the
master.
Negative Pressure
Attack
32 NMRI Sends back a negative pressure reading from the
slave.
Fast Attacks 33-34 CMRI Sends back a high set point then a low setpoint
which changes “fast”
Slow Attack 35 CMRI Sends back a high setpoint then a low setpoint
which changes “slow”
The list and the description has been represented in the table 3.6- 3.8 which includes all
the 35 attacks in the dataset. Various specific attacks working on this system act normal and can
be used on any type of system. So, the following data set can be used for internal and external
research for the industrial control system area. The following section explains the importance of
the data set for the IDS research than the Gao's data set.
3.6 Dataset Validation
The section explains the importance of the new data set from the previous one. It has been done
by comparing the two data sets.
The data set was executed by a subset of the tests for the identification if the patterns from Gao's
data set have been excluded. It all was considered for the determination of the machine learning
algorithms in the SCADA systems in the anomaly detection. The effectiveness of the machine
learning algorithm has also been determined while testing with the Gao data set. According to
the outcomes at the end, there were unintended patterns in the Gao's pipeline dataset. The similar
technique has been used for analysing if the new data set has the same patterns. There are
275000 instances in the data set, so the algorithms need a specific amount of memory and time
for the execution process. The same procedure had occupied the 10% of the 100% which was not
27
Document Page
described in the paper by Thornton and et.al. Like the algorithms that were used for minimizing
the memory and time constraints.
Table 3.9 List of Algorithms
Algorithms Category
Naïve Bayesian Network Bayes
PART Rule-Based
Random Tree Decision Tree
Multilayer Perceptron Neural Network
The first step in both old and new datasets were because the data sets can be used with the
machine learning algorithms listed in above table. The accuracy of the classification produces
outcomes from the algorithms that were collected and compared with that of Thornton and et.al.
Below the Table 3.10 distinguishes between the two data sets.
Table 3.10 Results of Algorithms
Algorithm New Dataset
Classification Accuracy
Gao’s Dataset
Classification Accuracy
Naïve Bayesian Network 80.39% 98.5%
PART 94.14% 99.32%
Random Tree 99.7% 99.9%
Multilayer Perception 85.22% 100%
The table above represents the less accuracy of the algorithms for detecting the anomalies
with the help of new data sets and it will directly reflect the outcome of the new methodology
which has been used for developing the data set. While determining the effectiveness of the
algorithms, the classification accuracy cannot act as the only statistic factor. Precision, false
positive rates (FP rates) and the recall acts equally important. A very essential statistic is the
false positive as it can help in revealing the discrepancies while considering the comparison
between the normal and the attack activity (Da, 2000). A very common example of this is that
the system has a 99% of traffic which is considered anomalous. After that an IDS can be
considered as all the traffic to act normal so that it can achieve a positive rate of 99%. In reality,
28

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
it does not detect even 1% of the anomalous traffic. The table represented below determines the
overall percentage of the normal traffic and the attack in every data set.
Table 3.11 Percentage of attacks in dataset
Dataset Percentage of Attack Instances Percentage of Normal Instances
New Dataset 21.9% 78.1%
Gao’s Dataset 37.1% 62.8%
With the help of Kappa statistics, the discrepancy has been shown within the attack
scenario and the normal. This statistic helps in providing the metrics for the representation of the
percentage of agreements in the two observers who can instantly take any instance in the label of
the data set. The kappa statistics for the following is considered as 83.1%. It means the matching
up of all the instances that have been assigned.
PART algorithm can help in conducting the further analysis. This algorithm has been used as it is
considered as the rule based algorithm enough relevant to the fixed network topology which
includes regular communication patterns. It outlined various benefits of the specified data set.
Also, some outcomes were produced which determines various categories of attacks having
minimized patterns. The analysis further can be done with the help of other three algorithms.
Table 3.12 and 3.14 represents the attacks that were not detected using the PART algorithm.
Table 3.12 Comparison of False Positive Rates
Category New Dataset FP (%) Gao’s Dataset FP (%)
Normal 20.7% 1.1%
NMRI .8% 0%
CMRI .5% .1%
MSCI 0% 0%
MPCI 0% .2%
MFCI 0% 0%
DoS 0% 0%
Recon 0% 0%
According to the table, 20.7% of the attack traffic is negatively reported as normal. Also,
it is not included in a category of the attacks compared to 1.1 % of the Gao's data set. It is an
improvement from the Gao's data set as the attacks were quite tough to get deciphered from the
29
Document Page
normal. Investigation of the recall and precision represents the exact categories of the attacks in
the new data set which have been described incorrectly. Precision is considered as the number of
instances that are described as an area of attack and also, the overall number of instances
depicted as that category of attack.
The following equation represents the way precision can be calculated for the NMRI attacks
Precision = Number of instances represented correctly as NMRI/ Total amount of instances
classified as NMRI
The precision will help in achieving a metric for the representation of the instances that is
divided into an area of attacks against the actual amount of that specific category.
Recall can be considered as the number of instances that are depicted correctly in the area of
attacks with the total amount of instances in that category. In NMREI attacks, the recall can be
calculated as following:
Recall = number of instances represented positively on NMRI/ Total amount of NMRI instances.
It helps in achieving a metric for the determination of the true positive ratio considering the
attacks. The following table gives the specific values for the precision and recall for both the data
sets.
Table 3.13 Precision and Recall for Datasets
New Dataset Gao’s Dataset
Category Precision Recall Precision Recall
Normal 94.5% 99.9% 99.4% 99.5%
NMRI 74.2% 82.4% 99.5% 94.4%
CMRI 89.3% 82.1% 99.4% 99.9%
MSCI 99.3% 54.9% 97.4% 95.1%
MPCI 99.8% 63.9% 97.5% 98.0%
MFCI 98.6% 100.0% 100.0% 95.8%
DoS 99.6% 48.3% 99.8% 97.9%
Recon 100.0% 97.1% 100.0% 100.0%
The table 3.13 represents that in the Gao dataset the precision and recall for categories of attack
are high.
30
Document Page
Table 3.14 Confusion Matrix for NMRI and CMRI attacks
The two categorization of attacks have been shown by PART algorithm .
For DoS , MPCI and MSCI the recall rates in table 3,12 shows that have high precision.
The instances that were classified in DoS attacks were only 48.3%. The Bad CRC attack was the
reason for low recall. For DoS attacks the precision and recall for dataset Gao and new dataset
are same.
The difference in the coverage set point values are shown in figure 3.7 which differs from
previous iteration of the dataset.
31

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
In the above figure normal behaviour is represented by bars labelled blue and the packet that
contains an attack are represented in red label.
The only four values of set point was represented in the future work of this report. The
high detection rate in table are not real as attacks were detected easily. In this feature of dataset
this static behaviour is not found.
Figure 3.8 illustrates the behaviour in the gain PID parameter.
32
Document Page
By providing a range of values for each parameter a new iteration provides a more
coverage.
In a similar way MSCI attacks were also effected. In this problem was mentioned. The
figure 3.9 represents how the system is placed in all control modes.
The system has been placed in different system modes and the values that
have been measured seems more reflective and are of more variance. It has
also made the number of attacks limited that acted prevalent in previous
iterations.
CONCLUSION
With increased connectivity, the outsider threats of SCADA system are becoming more
vulnerable. There is an increase in the IDS research of industrial control. Under the cyber-attacks
this thesis provides a set of labelled network data logs .There have been implementation of a new
methodology in order to create these data logs. The 35 cyber-attacks can be used to train and test
classifiers used by IDSs. The previous iteration of the dataset was compared to provide
validation. By representing correlation between features and attacks patterns have been removed.
The third-party validation of results will be provided by the dataset.
33
Document Page
REFERENCES
Books and Journals
"Cryptography and Security in Computing." (2012): n. pag. Tech Target. Web.
"Dell Security Annual Threat Report." Boom: A Journal of California 5.1 (2015): 12-13. Dell.
Dell, 2015. Web. 5 May 2015.
"Introduction to Industrial Control Networks" (PDF). IEEE Communications Surveys and
Tutorials. 2012.
"Simply Modbus - About Modbus TCP." Simply Modbus - About Modbus TCP. N.p., n.d. Web.
03 June 2015.
"Understanding Intrusion Detection." Sans.org. SANS Institute, 2001. Web. 27 Oct. 2014.
A. Almalawi and et.al.,“An unsupervised anomaly-based detection approach for integrity attacks
on SCADA systems”, Computers & Security, Volume 46, October 2014, Pages 94-110,
ISSN 0167-4048,
A. Mahmood and et.al., "Building a SCADA Security Testbed," Network and System Security,
2009. NSS '09. Third International Conference on , vol., no., pp.357-364, 19-21 Oct.
2009
Boyer and Stuart. "Collecting Data from Distant Facilities." ISA. International Society of
Automation, 27 Oct. 2014. Web. Oct. 2007.
D. Dzung and M. Naedele.; V. Hoff, T.P.; Crevatin, M.; "Security for Industrial Communication
Systems," Proceedings of the IEEE , vol.93, no.6, pp.1152-1177, June 2005
D. Kang and et.al.,"Analysis on cyber threats to SCADA systems," Transmission & Distribution
Conference & Exposition: Asia and Pacific, 2009 , vol., no., pp.1-4, 26-30 Oct. 2009
D. Yang, A. Usynin, and J. Wesley Hines. "Anomaly-based intrusion detection for SCADA
systems." 5th intl. topical meeting on nuclear plant instrumentation, control and human
machine interface technologies (npic&hmit 05). 2006.
Gao, W. and et.al., On SCADA Control System Command and Response Injection and Intrusion
Detection, in the Proceedings of 2010 IEEE eCrime Researchers Summit. Dallas, TX.
Oct 18-20, 2010. Best Paper Award Winner!
34

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
H. Lin., and et.al., Adapting Bro into SCADA: building a specification-based intrusion detection
system for the DNP3 protocol, Proceedings of the Eighth Annual Cyber Security and
Information Intelligence Research Workshop, January 08-10, 2013, Oak Ridge, TN
H. Sugwon and L., Myongho; , "Challenges and Direction toward Secure Communication in the
SCADA System," Communication Networks and Services Research Conference (CNSR),
2010 Eighth Annual , vol., no., pp.381-386, 11-14 May 2010
J. Carr. "Snort: Open Source Network Intrusion Prevention." ESecurity Planet. ESecurity Planet,
5 June 2007. Web. 02 Nov. 2014.
J. Meserve, "Sources: Staged Cyber Attack Reveals Vulnerability in Power Grid." CNN. Cable
News Network, Sept. 2007. Web. 27 Oct. 2014.
J. Weiss, "Misconceptions about Aurora: Why Isn't More Being Done." InfoSec Island. N.p., 13
Apr. 2012. Web. 27 Oct. 2014.
J.M. Moya and et.al., Improving Security for SCADA Sensor Networks with Reputation
Systems and Self-Organizing Maps. Sensors 2009, 9, 9380-9397.
K. Da., 2000. Attack development for intrusion detection. Master’s Thesis. Massachusetts
Institute of Technology, Cambridge, MA.
M. Brundle and M. Naedele "Security for process control systems: An overview", IEEE
Security Privacy, vol. 6, no. 6, pp.24 -29 2008
M. Hall and et.al., (2009); The WEKA Data Mining Software: An Update; SIGKDD
Explorations, Volume 11, Issue 1.
M. Pires and P.S. Oliveira, L.A.H.G., "Security Aspects of SCADA and Corporate Network
Interconnection: An Overview," Dependability of Computer Systems, 2006. DepCos-
RELCOMEX '06. International Conference on , vol., no., pp.127,134, 25-27 May 2006
doi: 10.1109/DEPCOS-RELCOMEX.2006.46
R. Anderson (2001). Security Engineering: A Guide to Building Dependable Distributed
Systems. New York: John Wiley & Sons. pp. 660-667.
S. Cherry, How Stuxnet Is Rewriting the Cyberterrorism Playbook. 2010. Available at:
http://spectrum.ieee.org/podcast/telecom/security/how-stuxnet-is-
rewritingthecyberterrorism- playbook accessed on 09.05.2014.
S. Cheung and et al. "Using model-based intrusion detection for SCADA networks."
Proceedings of the SCADA security scientific symposium. Vol. 46. 2007.
35
Document Page
S. Valentine and C. Farkas., "Software security: Application-level vulnerabilities in SCADA
systems," Information Reuse and Integration (IRI), 2011 IEEE International Conference
on , vol., no., pp.498-499, 3-5 Aug. 2011
W. Jones, "Flame: Cyberwarfare's Latest, Greatest Weapon." - IEEE Spectrum. IEEE, May
2012. Web. 27 Oct. 2014.
36
1 out of 39
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]