Big Data Privacy Report

Verified

Added on  2019/09/25

|10
|2975
|421
Report
AI Summary
This report provides a comprehensive overview of data privacy preservation methods within the context of big data. It begins by establishing the context of big data's rapid growth and the inherent privacy risks associated with its distributed nature and vast volume. The report then delves into the infrastructure of big data, examining the lifecycle stages of data generation, storage, and processing. For each stage, it highlights specific privacy challenges and proposes mitigation strategies, such as access restriction, data falsification, and data integrity verification. The report also discusses the role of cloud computing in big data storage and the importance of ensuring data confidentiality and integrity in cloud environments. Finally, the report concludes by outlining future research directions and challenges in big data privacy, emphasizing the need for robust security measures and decentralized storage solutions to address the ever-increasing volume and complexity of data.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
Big Data Privacy

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Abstract
In recent time big data become a future topic all over the world. As the volume of big data increase
the breach in privacy with people data and information are also increases. Big data always require
high computational and big storage power that make people use distributed systems to manage their
large amount of data. Distributed systems mainly consist of so many different parties which
increase the risk of privacy among the users.
As the amount of big data increase the breach in privacy with people data and information are also
increases. Big data always require high computational and big storage power that make people use
distributed systems to manage their large amount of data. Distributed systems mainly consist of so
many different parties which increases the risk of privacy among the users.
The main aim of this paper is to provide a complete overview of data preservation methods in big
data and current challenges for previous mechanisms. In this paper, we are going to explain about
the infrastructure of big data and their data preservation mechanism in each stage of their life cycle.
After that, we also going to discuss the future research and challenges related to privacy
preservations in big data.
Keywords: Big Data, Privacy & Security , Big Data Processing
Document Page
Introduction
Due to the huge development in recent technology, the amount of data generated by a different type
of IoT devices and applications increases day by day at a very high rate. All the big amount of data
generated from a different type of sources in a different format a high rate is a term as big data. Due
to its high generation rate, it becomes very hard to handle all those data using old systems and
traditional processes. Big data property mainly reflected by 3V’s, which define as volume, velocity,
and variety. If big data can capture and manage in a timely manner then it will very beneficial for
both enterprise and user as well. It can help the business enterprise in the decision making the
process by providing data analysis of big data.
According to International Data Corporation, Big Data explain as “ a new generation technology or
architecture which mainly design to extract the fine data according to requirements from a huge and
wide variety of data by doing analysis.
In the above figure volume mainly represent the generated amount of data, the rate at which data is
generated through various IoT devices, here diverse nature of data is denoted as variety. The
enormous amount of data is responsible for the increase in a privacy breach. Most of the websites
and other social applications use our data to analyze day to day activities as well as browsing
history to get commercial benefits. Privacy of user breached under the following conditions :
a) When using personal data and information attached to outer datasets leads to a new fact about the
user. Maybe those new fact are important, secret and not supposed to know by others.
b) Few times personal information mainly collected to add value in the business by knowing their
customer personal information and habits.
c) when important data and information not placed in a secure place then leakage of data occur
during processing and storage phase.
In recent years Big Data plays a major role in the IT industry by introducing completely a new
concept as well as the enterprise system. The previous system is very old and a little bit expensive
as well as complex in implementation and use. To prevent this type of situation cloud-based
enterprise system is recently introduced to provide flexibility, scalability, and independence in the
infrastructure of IT. Due to lack of much research and literature, this area is still completely not
explored but nowadays it attracted high interest from the general user. This paper going to explain
about the challenges and advantages of cloud enterprise system as well as their potential to ease the
daily life of the user.
Document Page
Data Sources And Management
Nowadays the amount of data in the world has been increasing at an exponential rate and
experiencing the growth of 50% per year. Due to this Data-set become so large and complex which
leads to the emergence of new database management tools such as:
1)Open Innovations 2)Open Data 3)Open Source(e.g Hadoop).
There are also three main characteristics of big data :
i)volume(data quantity) (ii)Velocity(Data Speed) (iii)Variety(Datatypes) (P. Church, 2013)
A big player from the web like Facebook, Google and Amazon handled their data based upon
customer interactions with their services. They develop many big data tools to collect, store and
analyze large quantities of data like Dynamo DB, Big Table, Cassandra, and Hadoop. Facebook is a
popular social network with 1.2 billion users worldwide. Apart from google, facebook is the only
company which stores a high level of detailed customer information. To increase the result of data
collections agencies and researchers has access to many more variables which are strictly needed to
answer their original hypothesis. So many times the collected data are not fully explored or used by
the original research team due to the limitation in time, interest or resources. (Rita Sallam,2018)
Big Data Infrastructure
As shown in the above figure Big Data has to go
through different stages during its lifecycle. To
manage different type of data with respect to its
volume, speed, and variety, we create a new system
to manage the high speed of huge data coming from
multiple data sources.
Apache Hadoop is the main force behind the growth
of the big data industry.
MapReduce is a programming framework developed
by Google which supports the underlying Hadoop
platform to execute the big data sets residing on
distributed servers in order to create the aggregated
data. MapReduce actually has two different and

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
distinct works that Hadoop programs perform. The
first map job it takes a set of data and converts it
into another set of data. The less job takes the
output from a map as input and adds those data into
a smaller set of data.
Big Data Life Cycle
(A) Generation of Data-source
Multiple types of sources are responsible for the generation of data nowadays. Both
machine and human are responsible for the huge generation of data. Due to the huge
amount of data generation, which is complex and diverse in nature, it is tough to handle
them with the previous design.
Privacy in generation phase
Generation of data mainly divided into Passive and Active generations where the inactive
owner of the data provide their data to the third party willingly whereas in passive, User
data is used by the third party without their awareness by getting browsing details of the
user. The big problem nowadays for a user to how they protect their data from the third
party who want to take their data at any cost and by using any methods. User can reduce
the privacy risk during the data generation phase by using Data Falsifying and access
restriction methods.
a) Restriction Access
If the data owner doesn't want to share their data with the third party then they simply
refuse to share it and can use more effective access control methods to stop it. They can
use anti-tracking tools, ad blocker, and encryption tools which mainly design for browser
extensions. Anti-malware & antivirus helps in protecting sensitive data which comes from
unauthorized sources.
b) Data Falsifying
In few cases when sensitive data cannot prevent from third-party attack then they
distorted the data using specific tools so that third party can’t access the original contents.
Sockpuppet and MaskMe mainly used to hide or mask the original identity or details of the
user from the third party.
Document Page
(B) Storage of Data
This phase mainly responsible for managing and storing a large number of data-sets. It
mainly include two sections which are infrastructure of hardware and data management.
Distributed storage is mainly used in hardware infrastructure whereas the set of software
also deployed to manage a large number of data sets.
Storage phase privacy
Nowadays storing a high amount of data is not a big
challenge because of the boom in new data storage
technology like cloud computing. If big data storage
system is at risk then it becomes serious threats for
user personal data and information. When the data
store on the cloud, they mainly have three major
dimensions like confidentiality, integrity, and
availability. In this first two dimensions are directly
related to the data privacy.
The basic use of big data storage system is to
protect the privacy of the user. There are different
methods to protect the privacy of the user when
data is stored in the cloud.
Document Page
When in IoT device cloud computing is work for
storage of big data where owner of data loses
control over their data. So here data integrity
verification is very important to ensure the privacy
of the cloud user. The integrity verification provides
data protection against various threats.
The owners of data can perform integrity verification
by self or by using third-party. In recent time data
and information is knowledge and knowledge is
power and invariably power converts into money.
Now the more the data you have the more money
you make.
(C) Data Processing
The data process mainly refers to the process of
data collection, exchange and extracting useful
information from them. During the data transmission
phase after the collection of raw data from high data
sources, it transferred into the different type of
analytical applications. To solve this they are setting
the standard which is capable to solve all forms of
risk and provide reliability and consistency.
Nowadays the emerging data analytics is classified
into different areas: a) Text Analysis b) network

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
analysis c) web analysis d) data analysis and e)
mobile analysis.
Privacy Preservations in Data Processing
phase
The protection of privacy during data processing mainly consists of two phases
wherein first phase we protect the user data which collect the sensitive
information of user data whereas in the second one we only extract the data
from data sources without hampering the privacy if the data. They can
use anti-tracking tools, ad blocker, and encryption
tools which mainly design for browser extensions.
Anti-malware & antivirus helps in protecting
sensitive data which comes from unauthorized
sources. User data is used by the third party without
their awareness by getting browsing details of the
user.
The big problem nowadays for a user to how they
protect their data from the third party who want to
take their data at any cost and by using any
methods. User can reduce the privacy risk during the
data generation phase by using Data Falsifying and
access restriction methods. Personal data and information related
to legal problems are collected by court and other legal authority. Change from
paper to electronic or digital record needs a change in the security methods
and various security measures to prevent this type of threat. The social
problem caused due to this can be solved with full dignity in this information
age.
Conclusion & Future Scope
As the amount of data increases day by day, it is quite not possible to develop future
applications without implementing and using various type of data algorithm. In the above
report we discuss all the privacy issues in every stage of the data life cycle and explain its
pro and con of existing technology with big data applications. Successful big
Document Page
data analysis mainly require a systematic process
which mainly recognizes the problem of utilizing
existing data and explains the different
characteristics of secondary analysis. In all parts of
the world, the electronic information sources for big
data are becoming very efficient and more effective.
The overall goal of these tools is very similar to that
of others to provide a contribution to scientific
knowledge by offering alternate solutions which
mainly rely on existing data. In the coming future, a
big amount of data being collected, achieved by
researchers across the globe are becoming more
easily accessible. (Alys Woodward,2018)
To ensure that data can be used by only authenticated user and provide security to
transmission of data from end to end by using a different type of encryption methods like
PRE, ABE, and IBE. To get the exact value from the data set they have to share between
different companies where they use a different type of cryptographic algorithm to encrypt
or decrypt the data. Its lead to leakage of data and invades the privacy of the user. To
solve this we use an encryption technique where they permit share of data without any
encrypting and decrypting process with complete full security. As time passes, gradually all
the personal data are going to stored and collected on a centralized cloud
server. But in centralized storage, a single failure leads to complete loss as a whole. To
prevent this most of the researchers suggested going for decentralized storage. For this,
we need a strong algorithm which able to work on big data distribution over the network.
References
[1] J. Manyikaet al.,Big data: The Next Frontier for Innovation, Competition,and Productivity.
Zürich, Switzerland: McKinsey Global Inst., Jun. 2011,pp. 1–137.
[2] B. Matturdi, X. Zhou, S. Li, and F. Lin, ‘‘Big data security and privacy:A review,’’China
Commun., vol. 11, no. 14, pp. 135–145, Apr. 2014.
[3] J. Gantz and D. Reinsel, ‘‘Extracting value from chaos,’’ inProc. IDCIView, Jun. 2011, pp. 1–12.
[4] A. Katal, M. Wazid, and R. H. Goudar, ‘‘Big data: Issues, challenges,tools and good practices,’’
inProc. IEEE Int. Conf. Contemp. Comput.,Aug. 2013, pp. 404–409.
[5] L. Xu, C. Jiang, J. Wang, J. Yuan, and Y. Ren, ‘‘Information security in bigdata: Privacy and
data mining,’’ inIEEE Access, vol. 2, pp. 1149–1176,Oct. 2014.
Document Page
[6] H. Hu, Y. Wen, T.-S. Chua, and X. Li, ‘‘Toward scalable systems for bigdata analytics: A
technology tutorial,’’IEEE Access, vol. 2, pp. 652–687,Jul. 2014.
[7] Z. Xiao and Y. Xiao, ‘‘Security and privacy in cloud computing,’’IEEECommun. Surveys Tuts.,
vol. 15, no. 2, pp. 843–859, May 2013.
[8] C. Hongbing, R. Chunming, H. Kai, W. Weihong, and L. Yanyan, ‘‘Securebig data storage and
sharing scheme for cloud tenants,’’China Commun.,vol. 12, no. 6, pp. 106–115, Jun. 2015.
[9] N. Cao, C. Wang, M. Li, K. Ren, and W. Lou, ‘‘Privacy-preserving multi-keyword ranked search
over encrypted cloud data,’’IEEE Trans. ParallelDistrib. Syst.,vol. 25, no. 1, pp. 222–233, Jan.
2014.
[10] O. M. Soundararajan, Y. Jenifer, S. Dhivya, and T. K. P. Rajagopal, ‘‘Datasecurity and privacy
in cloud using RC6 and SHA algorithms,’’Netw.Commun. Eng., vol. 6, no. 5, pp. 202–205, Jun.
2014.
[11] S. Singla and J. Singh, ‘‘Cloud data security using authentication andencryption
technique,’’Global J. Comput. Sci. Technol., vol. 13, no. 3,pp. 2232–2235, Jul. 2013
[12] Sotomayor, B, Montero, RS, Llorente, IM & Foster, I 2009, 'Virtual Infrastructure Management
in Private and Hybrid Clouds', Internet Computing, IEEE, vol. 13, no. 5, pp. 14-22.
[13] Tarantilis, CD, Kiranoudis, CT & Theodorakopoulos, ND 2008, 'A Web-based ERP system for
business services and supply chain management: Application to real-world process scheduling',
European Journal of Operational Research, vol. 187, no. 3, pp. 1310-26.
1 out of 10
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]