A Comprehensive Review: Security and Privacy Challenges of Big Data

Verified

Added on 2019/09/30

AI Summary

This report provides a comprehensive review of the security and privacy challenges associated with big data, examining the increasing importance of data protection in various industries. It analyzes 58 peer-reviewed articles from 2007 to 2016, covering sectors like healthcare, finance, and social media, to identify security issues and propose solutions. The report highlights the significance of addressing privacy leakage and other security risks, particularly in internet applications and financial institutions. It also discusses data processing methods, including Knowledge Discovery from Data (KDD), and emphasizes the roles of data providers, collectors, miners, and decision-makers in privacy protection. The analysis underscores the need for advanced security measures, such as encryption and access control, to safeguard sensitive information in the face of rapidly evolving technological advancements and the growing volume and velocity of data.

1
A Review on Security and Privacy Challenges of Big Data
Manbir Singh, Malka N. Halgamuge*, Gullu Ekici and Charitha S. Jayasekara
School of Computing and Mathematics, Charles Sturt University, Melbourne, Victoria 3000, Australia
MHalgamuge@studygroup.com, GEkici@studygroup.com, subhashi@ieee.org
Abstract. Big data has a growing number of confidentiality and security issues. New technology doubtlessly brings people benefits, privileges,
convenience and efficiencies, with confidentiality issues. Additionally, technological advances are accompanied with threats that can pose
dangerous privacy risks that can be more detrimental than expected. Privacy of data is a source of much concern to researchers throughout the
globe. A question that remains unanswered is the question that “what exactly can be done” to resolve confidentially and privacy issues of big
data? To answer some questions, data collected for this chapter has been through the analyze of 58 peer reviewed articles collated from 2007 to
2016 in order to find some resolutions for Big Data confidentiality issues. The articles range from different industries that include healthcare,
finance, robotics, web applications, social media, and mobile communication. The selected journal articles are aimed used to make comparative
analysis of security issues in different areas to cast solutions. This chapter consists of four main parts: introduction, materials and method,
results, discussion and conclusion. This inquiry aimed to find different security issues of big data in various areas and gives solutions by
analyzing results. The results of the content analysis suggest that the internet applications and financial institutions are dealing with specific
security problems, whereas social media and other industries deal with confidentiality issues of sensitive information which have heightened
privacy concerns. Both these issues are addressed in this study, as retrieved results from the data, highlights the gaps that can be further
researched for development. The method used to gather data for this chapter is through the analysis of studies that deal with a particular
confidentiality issue. After the analysis and evaluations, the suggestions of confidentiality issues are displayed by using a different algorithm
method. This research has addressed gaps in the literature by highlighting security and privacy issues that big companies face with recent
technological advancements in corporate societies. By doing this, the research could shed revolutionary light on issues of big data and provide
futuristic research directions to solve them.
Keywords: Big Data · Security · Privacy · Security Issues · Data Analysis
1 Introduction
A significant portion of Information technology research efforts goes into analyzing and monitoring data regarding events on the
servers, networks and other connected devices. Big data is a fairly new concept in modern technological world. In recent times,
there has been an increasing usage of big data, as the problem of security has become very important [1]. This chapter covers
different aspects of big data security, in particular challenges related to big data variety, velocity and volume. The amount of
sensitive information that needs to be protected is constantly increasing [2]. However, in the era we live in, information is
required to be protected from hackers. Insufficient protection can bring various security challenges [3-5]. The notion of big data
came to use not long time ago, as it relates to big amounts of information that companies produce and need to store. This
information is then used to analyze and then predict the emerging trends in future sales by analyzing annual trends. Growing
number of informative data is subjected to storage issues as there are no security measures yet established. In fact, any given
amount of information that is generated is an issue of security and privacy. Confidentiality of sensitive information becomes a
perplexing issue for companies if they do not take considerable amount of time, effort, and resources to deal with confidentiality
issues. Big Data is used to manage great number of datasets, as all the vast amount of data is often not structured and have been
stored from different sources [6,7]. Traditional access control mechanisms to ensure privacies are insufficient in recent times
with the growth of demand which brings the need of a fine granular access control mechanism to make sure every aspect of
privacy is reflected. This framework is called an ontology-driven XACML context [5]. On the other hand, providing privacy in
cloud is much more complex than one thinks [8]. The majority of data preservation techniques are targeted at small levels as they
often fall short; an algorithm is designed with MapReduce to gain high scalability by performing computing in parallel. This
method is called local record anonymization [8]. In this case, the hybrid cloud is a very different approach and difficult to
implement. The idea is to separate the Sensitive data from non-sensitive data and store them in different trusted clouds. This
method of isolation is best suited for processing the image files in an entirely different approach, as it is used to deal with data at
rest [4]. A multilevel identity encryption method is used both at the file level and at block level to satisfy data protection. This
process helps to leverage cloud provider because of transparency [9].

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

2
Preserving privacy of information is one big issue; nonetheless providing security to IT is another huge matter, nonetheless this
matter needs to be taken into hands urgently. There are a lots of security risks in big data, as the main one is: Privacy leakage,
this is one of the most dangerous issues that has already caused many problems for companies [5]. Therefore, a whole range of
data needs different computational techniques to make it secure and safe. The first step in cloud security is to ensure the entry
points. This helps to detect possible attacks [10], and alert users to use the instruction detection system [11]. Encryption is
another way of making data safe in the local area network, and VPN encryption is used to safeguard data [12]. On the other hand,
in the case of web application, randomization is based on Random4 encryption algorithm which is similarly used [11]. This
prompts Big cooperates to use MuteDB architecture, where they incorporate data encryption into, key management,
authorization, and authentication from a new MuteDB architecture. This architecture assures scalable solution to guarantee the
confidentiality of information in the database [6].
Security of medical data falls into the same category, as both cloud-based technologies and attribute-based encryptions are used
for storage and retrieval [13]. Another way is to use a pocket-sized computer, which is called Raspberry Pi. This computer makes
sure that regional data is collected and kept isolated [14].
2 Data Processing Method
The Knowledge Discovery from Data (KDD), is often treated, as a synonym for ‘Data mining’. This is a method used to discover
information from data to avoid leakage. Every day millions of bytes of data are generated throughout the world. This process in
return allows companies to grow and remain in the competitive market by identifying seasonal trends and launching products in
the peak seasons. Therefore, the research in the area is significant and requires development. There are usually three steps
involved in the method of KDD, which is performed in an iterative way. They are discussed as below:
Step 1. Data processing: Data processing method is a step that selects inconsistencies of missing data fields and removes them
while reintegrating the data pool. It is presented in a form so that it can be read quickly, to generate and reach potential results.
Step 2. Data Transformation: This step is to transfer data into appropriate forms for mining. The Data is not presented in its
proper form, and therefore, it must be sorted out to represent a type that can generate some useful information.
Step 3. Data Mining: In data mining, various methods are employed to extract the information from the data, as algorithms are
used to extract information from the data pool.
Step 4. Pattern Evaluation: After the data is extracted, patterns are then evaluated to obtain knowledge on trends.
2.1 Privacy
Every day, large amounts of data is generated and processed in an array of industries. Thus, the privacy of data can be
established by methodical procedures. In fact, there are four different types of steps involved: (i) Data provider, (ii) Data
Collector, (iii) Data Miner, and finally (iv) decision makers, as these are people who are involved in the processing of data that is
collected and derived from knowledge from groups of data. Each one has different challenges to privacy protection, as these are
discussed below.
Approaches to privacy protection by:
Data Provider:
Data providers can provide data voluntarily according to the demands of the Data Collector. The data collectors can retrieve data
from the providers of customers’ daily activities. However, there are many ways to limit the data collectors’ access to this data
and this could be done in several ways. Internet companies now have a strong motivation to track users’ movement over the
Internet to ensure that the valuable information can be extracted from the data produced by users’ online activities. These can
block the advertisements on the sites, and also kill the script, for example, AdBlock, Encryption tools are used to encrypt data
and transfer them into Cypher-text which is not in a readable form and so it can be transferred in a safe way.
Data Collector:
The original data retrieved from data provider generally obtains sensitive information. The sensitive information needs specific
precautions before being passed onto the data mining process. Before sensitive information can be disclosed to the public, this
process must be done otherwise it will trouble companies. The process of enables to replace some value with a parent value, as

Thank you for using www.freepdfconvert.com service!
Only two pages are converted. Please Sign Up to convert all pages.
https://www.freepdfconvert.com/membership