Investigating Big Data Challenges within IoT and Cloud Infrastructure

Verified

Added on 2023/06/12

AI Summary

This report explores the significant challenges posed by big data within Internet of Things (IoT) and cloud computing environments. It highlights issues related to data storage, management, transmission, and real-time analytics. The rapid growth of IoT applications exacerbates these challenges, particularly concerning the velocity, variety, and volume of data. Traditional data protection mechanisms struggle to scale effectively, and analyzing unstructured data remains difficult. The report also discusses the importance of timely data processing for effective decision-making. It further includes a literature review, covering the definition of big data, the 3Vs (volume, velocity, variety), and present and past work in the field, emphasizing the need for innovative approaches to data management and analysis in the face of ever-increasing data volumes.

Running head: BIG DATA CHALLENGES IN IOT AND CLOUD
BIG DATA CHALLENGES IN IOT AND CLOUD
ALAPATI SESHUBABU
University Name
Author Note:

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Milestone 1
Executive Summary
The main aim of this research report is to understand the various challenges of big data in
Internet of Things and cloud. Big data is considered as one of the most effective technology that
helps in data analysis. The big data could be easily analyzed for the insights, which eventually
lead to the better business decisions and also strategic business movements. Internet of Things,
on the other hand, is the type of inter connection that help any user in sending or receiving of
data or information. Cloud computing is the most effective method of data transfer over the
connection of internet. Big data can bring out various advantages, when it is being mixed up with
the internet of things and cloud computing. However, there are some of the major issues or
challenges that can occur in the phenomenon. This research report has clearly described all the
important details related to such challenges.

Milestone 2
Introduce of the Problems
One of the main features of utilizing IOT is to observing the behavior of many things to
gain important optimize process, insights and so on. By utilizing the Big data several problems
are solved such as store all the events, run analytical queries over the stored events and perform
analytics (machine learning and data mining) over the data to gain insights (Cai et al., 2017).
However there are main challenges to achieve the ability for performing real time analytics using
big data such as correlate streaming events with my stored data in the operational database,
process streaming events on the fly and store streaming events in the operational database. Rapid
growth of IOT applications also presenting many challenges associate with the Big Data. Though
lot of problems is likely to be solved but still needs sufficient research to migrate problems
confronted. Data storage and management can raise challenges such as velocity, variety and
volume of Big Data (Lee et al., 2013). Big Data velocity involves storage system to enable the
capability to scale up quickly which is hard to gain with traditional data protection mechanism.
However big data on the cloud is super expensive for utilizing a vast data volume. There are
other certain challenges of Big Data in Data transmission process in several stages of lifecycle.
Most of the challenges occur in the process of data collection, data integration and data
management. Transferring large amount of data produces obvious risks in each of stages. It is
also hard for traditional systems to efficiently analyze, manage and visualize unstructured data
(Barnaghi, Sheth & Henson, 2013). The main reason for huge amount of data produced by IOT
enabled devices is the growing number of internet-enabled devices utilized for several purposes
by enterprises, individual and government. The large amount of data generated by IOT devices
needs more computational power to process. Consequently, data produced by IOT devices in

several application domains are time perilous. Processing the vast amount of data in a timely
manner is very demanding. As a result the whole process, plans and effective decisions are
formulated for several applications sets.
Structure of the report
This section provides the clear definition of the structure followed in this report. There is
several section of this report for clearly describing the security and privacy regime in IOT. In
every section there is proper heading with suitable subheading to demonstrate the broad about
security and privacy regime associate with IOT. The demonstration of the followed structure is
following-
 Executive Summary: Executive summary is the first part of the report where the summary
of the whole report is presented. This section discusses about the Big Data challenges in
the IOT. It helps to understand the whole report in short time. These sections also provide
the brief summary of the whole content.
 Introduction: This part discuss about the basic challenges associate with the topic. It
provides basic understanding of the chosen topic. It broadly describes the basic
knowledge of big data and what are the challenges associates with the increment of data
set. It also explains familiar application of big data associate with the IOT and cloud.
 Literature Review: this part of the project discuss about the published literature on the
relevant topic. It provides a broad understanding of previous literature and the finding of
that literature with reflection. This section provides a board review of existed research
papers conducted through many sources such as Google scholar and e-libraries.
 Methodology: this is one of the crucial parts of the project where the suitable method will
be chosen for completing the report. After reviewing and compare past and previous

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

methodologies my own perspectives are illustrated. This section illustrates five
methodologies which are utilized to propose the desired knowledge of big data challenges
associates with the IOT and cloud.
 Conclusion: in this part the result of the whole report is illustrated. This part widely
discusses about the big data challenges in the IOT and how to measure them in term of
efficiency, feasibility and connectivity.
Milestone 3
Literature Review
Introduction
Nowadays, there are several modifications are happening in the technology field
especially in the platform of cloud, big data and internet of things. The internet and the amount
of the users are growing rapidly, it also producing vast amount of data day by day. According to
some research (Bessis & Dobre, 2014). Everyday more than 2.35 quintillion bytes are produced.
The vast amount of data is produced by the user who is creating this for utilizing many
applications. It is getting hard to manage this vast amount of data which are being produces for
several uses. The collaboration of machine and human is producing many challenges as well.
According to Chen, Mao & Liu, 2014, most of the devices or system used by user in a daily basis
produces vast amount of data which are always not visible as hidden information stored in the
devices. Internet of things is triggering the database management growth in future. The
challenges occur to manage large dataset from every system or devices. Prediction capabilities of
the applications and services and higher level of decision making are very essential to achieve
the full advantage from the services and context aware data intensive applications and make the

important information or valuable transparent and available at a much higher frequency. As per
Chen et al., 2015, big data has become very essential to turn the vast amount of data into
information that can be transferred with a high frequency. Big data can be define as high
velocity, volume an verity information which enable innovative way of information processing
and to gain enhanced insights through automation process, effective decision making and
analytical tool. According to research paper from Gartner, almost every business, enterprise and
individuals are producing vast amount of data which can be analyzed through automation
process to manage it more effectively (Lee & Lee, 2015). In fact the life of data can be very short
and this may become obsolete after some time. So, efficient usage of Big Data analytics results
will bring good useful insights from high volume, variety of data. Sometimes quality of the data
is a concern area because it fetches data from different applications for making decisions. Not all
the data captured from various devices are useful for making decisions (Li, Da Xu & Zhao,
2015). These are just information and the information has to be converted into knowledge for
decision making.
Fan & Bifet, 2013, state that for the large and complex volume of big data it is very tough
to manage and analyze the data. Effective data management would not be possible by utilize
typical Rational dataset management system (RDBMS). For the vast amount of big data it is also
very hard to extract proper information in an effective manner. Generally, the raw big data is the
source of the desired information. Data is consisted with each other as raw input. Individual’s
data has no value. Vast amount of data provide meaningful information with several patterns and
trends.

Present and Past Work
There are several research papers available which are available in the web. Some of the
research tried to estimate all the challenges could possible occurred by the overwhelming amount
of data (Jin et al., 2014). Technology shifts more data through various applications like wireless
sensors, smart devices, social media etc. This paper focuses on the improvement the performance
of the old services and offer new services in an open and dynamic environment.
Big Data Definition
Big data is the specific type of data set, which is extremely complex as well as
voluminous and the traditional data processing application software is not at all sufficient for
dealing with these complex data sets (Gudivada, Baeza-Yates & Raghavan, 2015). Big data is
one of the most important and significant data sets that have the advantage to deal with any type
of data and there is no restriction to data size or data limit. It can be easily referred to as the
evolving term, which describes all types of voluminous amount of unstructured, structured and
semi structured data, which comprises of the string potential for information mining. This
particular type of data, big data is subsequently categorized by the help of 3Vs (Mishra, Lin &
Chang, 2015). These 3Vs are highest volume of the data, broader variety of data and finally the
highest velocity of data. The data that is being processed is checked with the help of this big
data. There is no specific data volume for big data and thus this term is utilized for describing the
petabytes, terabytes and finally exabytes that is captured over time. This specific type of data
inundates any organization on the day to day basis. The big data could be easily analyzed for the
insights, which eventually lead to the better business decisions and also strategic business
movements.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Big data eventually refers to the proper utilization of user behavior analytics, predictive
analytics and any other method of advanced data analytics. These advanced data analytics are
solely responsible for extracting significant value from the data and also to the specific data set
size (Obitko, Jirkovský & Bezdíček, 2013). The proper analysis of the data sets could easily find
out various newer correlations that help in spotting the business trends, preventing the diseases
and combating the crime and many more.
Description of 3Vs in Big Data
As per Hassanalieragh et al., 2015, for the description of the 3Vs, that is volume, velocity
and variety, big data has become explicitly popular and well accepted by every business or
company. The organizations or businesses eventually collect the data from a vast variety of
various sources. These sources majorly include all types of business transactions, social media
and finally sensor information or data and data related to machine to machine. In the previous
days, storage of the volumetric data was extremely difficult, however, with the rise of various
technologies like Hadoop, the complexities or problems are solely reduced to a greater extent
(Sivarajah et al., 2017). The next V amongst the 3Vs is velocity. Data streams in at an
unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors and smart
metering are driving the need to deal with torrents of data in near-real time. This is extremely
important for any organization or business as it increases the velocity or speed of the business
processes or operations. The final V is the variety. It eventually refers to the fact that any type of
data comes in every format. The various formats of data are structured, numeric and unstructured
(Gubbi et al., 2013). The traditional databases are not affected in this sector and hence the
emails, audio, video and various financial transactions are checked in the process.

Apart from the above mentioned 3Vs two more dimensions are present within the big
data. These are variability as well as complexity. Variability refers to the fact that big data does
not comprise of any type of consistency, which means the data is not at all fixed (Hashem et al.,
2015). There is always a high chance of variation or changeability. Thus, easy adaption and data
change is possible in big data. Moreover, this big data is much complex to crack and hence there
is extremely less chance that the complexity is almost zero in this particular data type.
Big Data in Internet of Things and Cloud
Internet of things can be defined as the significant network of various physical devices,
home appliances, vehicles and several other items that are eventually embedded with the
software, electronics, actuators, connectivity and sensors that solely enable all the objects in
proper connection as well as exchanging of confidential information and data (Yang, 2014). All
these things that are related to internet of things and cloud computing could be uniquely
recognized by the specific embedded computing system and has the ability in inter operability in
the existing infrastructure of the Internet connection. This internet of things enables all the
objects that are to be controlled as well as sensed remotely within the existing network
infrastructure. This helps to create various opportunities to directly integrate the typical physical
world within the systems that are completely based on computers. This helps to improve the
efficiency, to provide economic advantages and finally to provide accuracy of the systems. This
eventually reduces the efforts as well as intervention of the human beings (Strohbach et al.,
2015). When IoT or cloud is augmented with sensors and actuators, the technology becomes an
instance of the more general class of cyber-physical systems, which also encompasses
technologies such as smart grids, virtual power plants, smart homes, intelligent
transportation and smart cities.

Big data and Internet of things or cloud computing are both extremely important and
significant technologies in the technological world. When these two popular and famous
technologies are amalgamated together, they bring out one of the best technologies for the world.
The exclusive growth of the number of various devices that are connected to the technology of
IoT or Internet of Things as well as the exponential increment of consumption of data has
eventually reflected on the fact that the growth or development of big data can easily overlap the
technology of IoT or Internet of Things (Wortmann & Flüchter, 2015). The entire management
of the big data within the continuous expansion of network has solely given rise to any non
trivial concern about the efficiency of data collection, data security, and analytics of data and
processing of data. For the purpose of addressing all of these problems or concerns, the
researchers have subsequently examined each and every challenge that is linked with the proper
deployment of Internet of things. Although, there are various studies on the big data and Internet
of things, there is a strong convergence of all of these areas and thus creating various
opportunities to flourish the big data or analytics for the systems of internet of things and cloud
computing.
Big Data Issues in IoT and Cloud with Solutions
Big data comprise of various issues or challenges when it is mixed with the Internet of
things and cloud. The various challenges of big data with their probable solutions are given
below:
i) Heterogeneity and Incompleteness: The first and the foremost challenge of the big
data when present in the technology of Internet of things and cloud is heterogeneity and
incompleteness (Zaslavsky, Perera & Georgakopoulos, 2013). The difficulties of big data
analysis derive from its large scale as well as the presence of mixed data based on different

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

patterns or rules or heterogeneous mixture data in the collected and stored data. In the case of
complicated heterogeneous mixture data, the data has several patterns and rules and the
properties of the patterns vary greatly. Moreover, there are various uncertainties of data that data
might lose the integrity, while it is being managed. All these occur when big data is working
with internet of things (Zanella et al., 2014). The solution to the above mentioned problem is that
the data should be collected or stored properly for reducing the problem of heterogeneity and the
issue of data incompleteness is solved by my qualitative data analysis.
ii) Working with Large Set of Data: There is a challenge in big data that it cannot work
with smaller data sets and thus only huge amount of data sets are involved in the process. Often
the user suffers from various issues due to this as he is not being able to link internet of things
with big data if the data set is smaller. For solving this problem, various approaches are to be
taken by the user
iii) Scale as well as Complexity: The third significant challenge of the big data in IoT and
cloud is that it is extremely complex and the management of the larger and increasing data
volume is considered as the most challenging issue ever for any user. Thus, the user needs to
manage or control the data properly and perfectly. The solution to the above mentioned problem
is that the data should be managed with the help of various software tools.
iv) Data Clustering: Another significant issue with the big data in internet of things is
data clustering. This data clustering refers to the task of clustering or grouping the specific object
set in the particular method by which the objects within the same cluster work together (Da Xu,
He & Li, 2014). This becomes a major problem for Internet of things and cloud computing as it
cannot work with data clustering. The solution of this challenge is to store the data without being
clustered.

Methodologies
Method 1: Hadoop:
This is an Open source mission which is mainly managed by the Apache Software
Foundation. By using this Hadoop it is easy to collect and handle the big data. This is proposed
for the purpose of parallelizing the processing’s of the data and this is done by computations of
the nodes in order hurry the computations and also to hide the latency (Perera et al., 2014). The
Hadoop mainly consists of two components and this includes the Hadoop Distributed File system
and the Map Reduce engine. The Hadoop Distribution File System is associated with storing the
enormous data in a constant way and is also associated with reproducing it to the user
applications which are at high bandwidth. The MapReduce can be considered as a framework
which is used for eth purpose of process the data sets which are massive and are present in a
distributed fashion through numerous machines.
Method 2: Map Reduce:
This is mainly constructed in the form of board programming paradigm. It is seen that
some of the original employments are offered all the key needs related to the parallel execution,
fault tolerance, balancing of load, and manipulation of data. This has been named as this mainly
includes two kinds of abilities from the functional computer languages which are existing and
this includes the map and reduces (Stankovic, 2014). The MapReduce framework is associated
with gathering all the sets which are consisting of common keys from all the records which is
followed by joining them together. This initially results in the formation of one group for each of
the different keys which are produced. This is one of the new technologies, which is just an
algorithm and a technique by which all the data can be fitted. In order to obtain the best from the
Map Reduce there is just a need of an algorithm. For this there is a need of collection of products