Early Detection of Cybersecurity Threats Using Collaborative

Added on 2022-11-13

10 Pages8629 Words3 Views

Early Detection of Cybersecurity Threats
Using Collaborative Cognition
Sandeep Narayanan, Ashwinkumar Ganesan, Karuna Joshi, Tim Oates, Anupam Joshi and Tim Finin
Department of Computer Science and Electrical Engineering
University of Maryland, Baltimore County, Baltimore, MD 21250, USA
{sand7, gashwin1, kjoshi1, oates, joshi, finin}@umbc.edu
Abstract—The early detection of cybersecurity events such as
attacks is challenging given the constantly evolving threat land-
scape. Even with advanced monitoring, sophisticated attackers
can spend more than 100 days in a system before being detected.
This paper describes a novel, collaborative framework that assists
a security analyst by exploiting the power of semantically rich
knowledge representation and reasoning integrated with differ-
ent machine learning techniques. Our Cognitive Cybersecurity
System ingests information from various textual sources and
stores them in a common knowledge graph using terms from
an extended version of the Unified Cybersecurity Ontology. The
system then reasons over the knowledge graph that combines a
variety of collaborative agents representing host and network-
based sensors to derive improved actionable intelligence for
security administrators, decreasing their cognitive load and
increasing their confidence in the result. We describe a proof
of concept framework for our approach and demonstrate its
capabilities by testing it against a custom-built ransomware
similar to WannaCry.
I. INTRODUCTION
A wide and varied range of security tools and systems are
available to detect and mitigate cybersecurity attacks, includ-
ing intrusion detection systems (IDS), intrusion detection and
prevention systems (IDPS), firewalls, advanced security appli-
ances (ASA), next-gen intrusion prevention systems (NGIPS),
cloud security tools, and data center security tools. However,
cybersecurity threats and the associated costs to defend against
them are surging. Sophisticated attackers can still spend more
than 100 days [8] in a victim’s system without being detected.
23,000 new malware samples are produced daily [33] and
a company’s average cost for a data breach is about $3.4
million according to a Microsoft study [20]. Several factors
ranging from information flooding to slow response-time,
render existing techniques ineffective and unable to reduce
the damage caused by these cyber-attacks.
Modern security information and event management (SIEM)
systems emerged when early security monitoring systems
like IDSs and IDPSs began to flood security analysts with
alerts. LogRhythm, Splunk, IBM QRadar, and AlienVault are
a few of the commercially available SIEM systems [11]. A
typical SIEM collects security-log events from a large array
of machines in an enterprise, aggregates this data centrally, and
analyzes it to provide security analysts with alerts. However,
despite ingesting large volumes of host/network sensor data,
their reports are hard to understand, noisy, and typically
lack actionable details [39]. 81% of users reported being
bothered by noise in existing systems in a recent survey on
SIEM efficiency [40]. What is missing in such systems is a
collaborative effort, not just aggregating data from the host and
network sensors, but also their integration and the ability to
reason over threat intelligence and sensed data gathered from
collaborative sources.
In this paper, we describe a cognitive assistant for the early
detection of cybersecurity attacks that is based on collabo-
ration between disparate components. It ingests information
about newly published vulnerabilities from multiple threat
intelligence sources and represents it in a machine-inferable
knowledge graph. The current state of the enterprise/network
being monitored is also represented in the same knowledge
graph by integrating data from the collaborating traditional
sensors, like host IDSs, firewalls, and network IDSs. Unlike
many traditional systems that present this information to
an analyst to correlate and detect, our system fuses threat
intelligence with observed data to detect attacks early, ideally
before the exploit has started. Such a cognitive analysis not
only reduces the false positives but also reduces the cognitive
load on the analyst.
Cyber threat intelligence comes from a variety of textual
sources. A key challenge with sources like blogs and security
bulletins is their inherent incompleteness. Often, they are
written for specific audiences and do not explain or define
what each term means. For example, an excerpt from the
Microsoft security bulletin is “The most severe of the vulnera-
bilities could allow remote code execution if an attacker sends
specially crafted messages to a Microsoft Server Message
Block 1.0 (SMBv1) server.” [22]. Since this text is intended
for security experts, the rest of the article does not define or
describe remote code execution or SMB server.
To fill this gap, we use the Unified Cybersecurity Ontol-
ogy [36] (UCO)1 to represent cybersecurity domain knowl-
edge. It provides a common semantic schema for information
from disparate sources, allowing their data to be integrated.
Concepts and standards from different intelligent sources like
STIX [1], CVE [21], CCE [24], CVSS [9], CAPEC [23],
CYBOX [25], and STUCCO [12] can be represented directly
using UCO.
We have developed a proof of concept system that ingests
information from textual sources, combines it with the knowl-
1https://github.com/Ebiquity/Unified-Cybersecurity-Ontology

Early Detection of Cybersecurity Threats Using Collaborative_1

edge about a system’s state as observed by collaborating hosts
and network sensors, and reasons over them to detect known
(and potentially unknown) attacks. We developed multiple
agents, including a process monitoring agent, a file monitoring
agent and a Snort agent, that run on respective machines and
provide data to the Cognitive CyberSecurity (CCS) module.
This module reasons over the data and stored knowledge graph
to detect various cybersecurity events. The detected events
are then reported to the security analyst using a dashboard
interface described in section V-D. We also developed a
custom ransomware program, similar to Wannacry, to test the
effectiveness of our prototype system. Its design and working
are described in section VI-A. We build upon our earlier work
in this domain [26].
The rest of this paper is organized as follows. Section II
identifies key challenges in cybersecurity attack detection fol-
lowed by a brief discussion of related work in Section III. Our
cognitive approach to detect cybersecurity events is described
in Section IV. Implementation details of our prototype system
and a concrete use case scenario to demonstrate our system’s
effectiveness are in Sections V and VI, before we discuss our
future directions in Section VII.
II. BACKGROUND
Despite the existence of several tools in the security space,
attack detection is still a challenging task. Often, attackers
adapt themselves to newer security systems and find new ways
past them. This section describes some challenges in detecting
cybersecurity attacks.
A critical issue which affects the spread and associated
costs of a cyber-attack is the time gap between an exploit
becoming public and the systems being patched in response.
This is evident with the infamous Wannacry ransomware.
The core vulnerability used by Wannacry (Windows SMB
Remote Code Execution Vulnerability) was first published by
Microsoft Security Bulletin [22] and Cisco NGFW in March
2017. Later in April 2017, Shadow Brokers (a hacker group)
released a set of tools including Eternal Blue2 and Double
Pulsar which used this vulnerability to gain access to victim
machines. It was only by mid-May that the actual Wannacry
ransomware started to spread3 internally using these tools. A
large-scale spread of Wannacry that affected over two hundred
thousand machines could have been mitigated if it had been
quickly identified and affected systems had been patched.
Variations of the same cyber-attack is another challenge
faced by existing attack detection systems. Many enterprise
tools still use signatures and policies specific to attacks for
detection. However, smart attackers evade such systems by
slightly modifying existing attacks. Sometimes, hackers even
use combinations of tools from other attacks to evade them.
An example is the Petya ransomware4 attack, which was
discovered in 2016 and spreads via email attachments and
infected computers running Windows. It overwrites the Master
2https://en.wikipedia.org/wiki/EternalBlue
3https://en.wikipedia.org/wiki/WannaCry ransomware attack
4https://blog.checkpoint.com/2016/04/11/decrypting-the-petya-ransomware/
Boot Record (MBR), installs a custom boot loader, and forces
a system to reboot. The custom boot-loader then encrypts the
Master-File-Table (MFT) records and renders the complete
file system unreadable. The attack did not result in large-
scale infection of machines. However, another attack surfaced
in 2017 that shares significant code with Petya. In the new
attack, named NotPetya5, attackers use Eternal Blue to spread
rather than using email attachments. Often, the malware itself
is encrypted and similar code is hard to detect. By modifying
how they spread, systems used to detect potential behavioral
signatures can also be bypassed.
Yet another challenge in attack detection is a class of attacks
called Advanced Persistent Threats (APTs). These tend to be
sophisticated and persistent over a longer time period [18][34].
The attackers gain illegal access to an organization’s network
and may go undetected for a significant time with knowledge
of the complete scope of attack remaining unknown. Unlike
other common threats, such as viruses and trojans, APTs
are implemented in multiple stages [34]. The stages broadly
include a reconnaissance (or surveillance) of the target network
or hosts, gaining illegal access, payload delivery, and execution
of malicious programs [3]. Although these steps remain the
same, the specific vulnerabilities used to perform them might
change from one APT to another. Hence, new approaches for
detecting threats (or APTs) should have the ability to adapt to
the evolving threats and thereby help detect the attacks early
on.
Our prototype system, detailed in Section IV ingests knowl-
edge from different threat intelligence sources and represents
them in such a way that it can be directly used for attack
detection. Such fast adaptation capabilities help our system
cater to changing threat landscapes. It also helps to reduce the
time gap problem described earlier. Moreover, the presence of
the knowledge graph and reasoning based on them helps to
identify variations in attacks.
III. RELATED WORK
A. Security & Event Management
As the complexity of threats and APTs grow, several
companies have released commercial platforms for security
information and event management (SIEM) that integrate
information from different sources. A typical SIEM has a num-
ber of features such as managing logs from disparate sources,
correlation analysis of various events, and mechanisms to
alert system administrators [35]. IBM’s QRadar, for example,
can manage logs, detect anomalies, assess vulnerabilities, and
perform forensic analysis of known incidents [15]. Its threat
intelligence comes from IBM’s X-Force [27]. Cisco’s Talos
[5] is another threat intelligence system. Many SIEMs6, such
as LogRhythm, Splunk, AlienVault, Micro Focus, McAfee,
LogPoint, Dell Technologies (RSA), Elastic, Rapid 7 and
5https://www.csoonline.com/article/3233210/ransomware/
petya-ransomware-and-notpetya-malware-what-you-need-to-know-now.html
6https://www.gartner.com/reviews/market/security-information-event-management/
compare/logrhythm-vs-logpoint-vs-splunk

Early Detection of Cybersecurity Threats Using Collaborative_2

Comodo, exist in the market with capabilities including real-
time monitoring, threat intelligence, behavior profiling, data
and user monitoring, application monitoring, log management
and analytics.
B. Ontology based Systems
Obrst et al. [29] detail a process to design an ontology for
the cybersecurity domain. The study is based on the diamond
model that defines malicious activity [16]. Ontologies are
constructed in a three-tier architecture consisting of a domain-
specific ontology at the lowest layer, a mid-level ontology that
clusters and defines multiple domains together and an upper-
level ontology that is defined to be as universal as possible.
Multiple ontologies designed later-on have used the above
mentioned process.
Oltramari et al. [31] created CRATELO as a three layered
ontology to characterize different network security threats.
The layers include an ontology for secure operations (OSCO)
that combines different domain ontologies, a security-related
middle ontology (SECCO) that extends security concepts, and
the DOLCE ontology [19] at the higher level. In Oltramari
et al. [30], a simplified version of the DOLCE ontology
(DOLCE-SPRAY) is used to show how a SQL injection attack
can be detected.
Ben-Asher et al.[2] designed a hybrid ontology-based model
combining a network packet-centric ontology (representing
network-traffic) with an adaptive cognitive agent. It learns how
humans make decisions while defending against malicious
attacks. The agent is based on instance-based learning theory
using reinforcement learning to improve decision making
through experience. Gregio et al. [13] discusses a compre-
hensive ontology to define malware behavior.
Each of these systems and ontologies looks at a narrow
subset of information, such as network traffic or host system
information, while SIEM products do not use the vast capabili-
ties and benefits of an ontological approach and systems to rea-
son using them. In this regard, Cognitive CyberSecurity (CCS)
takes a larger and more comprehensive view of security threats
by integrating information from multiple existing ontologies
as well as network and host-based sensors (including system
information). It creates a single representative view of the data
for system administrators and then provides a framework to
reason across these various sources of data.
This paper significantly improves our previous work [37],
[38], [26] in this domain, where semantic rules were used to
detect cybersecurity attacks. CCS uses the Unified Cybersecu-
rity Ontology that is a STIX-compliant schema to represent,
integrate and enhance knowledge about cyber threat intelli-
gence. Current extensions to it help linking standard cyber
kill chain phases to various host and network behaviors that
are detected by traditional sensors like Snort and monitoring
agents. Unlike our previous work, these extensions allow our
framework to assimilate incomplete text from sources so that
cybersecurity events can be detected in a cognitive manner.
IV. COGNITIVE APPROACH TO CYBERSECURITY
This section describes our approach to detect cybersecurity
attacks. It is inspired by the cognitive process used by humans
to assimilate diverse knowledge. Oxford dictionary defines
cognition [7] as “the mental action or process of acquiring
knowledge and understanding through thought, experience,
and the senses”. Our cognitive strategy involves acquiring
knowledge and data from various intelligence sources and
combining them into an existing knowledge graph that is
already populated with cyber threat intelligence data about
attack patterns, previous attacks, tools used for attacks, indi-
cators, etc. This is then used to reason over the data from
multiple traditional and non-traditional sensors to detect and
predict cybersecurity events.
A novel feature of our framework is its ability to assimilate
information from dynamic textual sources and combine it
with malware behavioral information, detecting known and
unknown attacks. The main challenge with the textual sources
is that they are meant for human consumption and the infor-
mation can be incomplete. Moreover, the text is tailored to
a specific audience who already have some knowledge about
the topic. For instance, if the target audience of an article is
a security analyst, the line “Wannacry is a new ransomware.”
carries more semantic meaning than the text itself. Based on
their background knowledge, a security analyst can expand
the previous description and infer the following actions that
Wannacry may perform:
• Wannacry tries to encrypt sensitive files;
• A downloaded program may have initiated the encryp-
tion;
• Either downloaded keys or randomly generated keys are
used for encryption; and
• Wannacry modifies many sensitive files.
However, a machine cannot infer this knowledge from the
text alone. Our cognitive approach addresses this issue by
integrating the experiences or security threat concepts (attacks
patterns, the actions performed and associated information
like source and target of attack) in a knowledge graph, and
combining it with new and potentially incomplete textual
knowledge using standard reasoning techniques.
To address the challenge of structurally storing and pro-
cessing such knowledge about the cybersecurity domain, we
use the intrusion kill chain, a general pattern observed in
most cybersecurity attacks. Hutchins et al. [14] described an
intrusion kill chain with the following seven steps.
• Reconnaissance: Gathering information about the target
and various existing attacks (e.g., port scanning, collect-
ing public information on hardware/software used, etc.)
• Weaponization: Combining a specific trojan (software
to provide remote access to a victim machine) with an
exploit (software to get first unauthorized access to the
victim machine, often exploiting vulnerabilities). Trojans
and exploits are chosen taking the knowledge from the
reconnaissance stage into consideration.

Early Detection of Cybersecurity Threats Using Collaborative_3

End of preview

Want to access all the pages? Upload your documents or become a member.