Examining the Techniques Used in Big Data Cloud Computing

Verified

Added on  2023/03/30

|9
|2457
|464
Essay
AI Summary
This essay provides an overview of the important techniques used in big data cloud computing. It begins with an introduction to big data and the role of cloud computing in managing large datasets. The essay then delves into specific techniques such as the Google File System (GFS), BigTable, and the MapReduce programming model, explaining how these technologies facilitate data storage, processing, and analysis in cloud environments. Furthermore, it addresses the challenges associated with migrating big data to the cloud, including concerns about security, compliance, network dependency, and latency. The essay concludes by discussing potential future developments in cloud computing, such as edge computing, quantum computing, and serverless computing, highlighting their potential to enhance the capabilities and efficiency of big data management in the cloud.
Document Page
BIG DATA
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
1
Contents
Introduction...........................................................................................................................................1
Why Big Data in cloud..........................................................................................................................1
Techniques used in Cloud Computing...................................................................................................2
Possible future developments................................................................................................................5
Conclusion.............................................................................................................................................5
References.............................................................................................................................................6
Document Page
2
Introduction
Big data is known as an area that suggests methods to evaluate, extract information from
anywhere systematically, or else deal with the too large data section which otherwise
considered to be complex and might not be effectively managed through data-processing
application software which was traditional. There are several challenges that are confronting
this technology (Stephenson, 2013). This includes data capture, data storage, data analysis,
sharing, search, visualization, updating, transferring as well as the problems related to privacy
of data and its source. Cloud computing has become a major part in the modern day business
as data has become an integral part of the business. The best thing about this technology is
that it is available as per the IT resource’s demand, especially the power of computing and
storing data lacking direct active management by the individual. It is a term that is usually
utilising for defining data centres available to several users over the internet (Antonopoulos
& Gillam, 2010). Since different companies have different data requirements hence cloud
computing infrastructure varies with it. This cloud computing is able to resolve many
problems of the company related to management of data. There are three kinds of service
delivery models being used which might be divided as Software as a service (SaaS), Platform
as a service (PaaS), and Infrastructure as a Service (IaaS). Cloud computing is generally
categorised on two basis one by cloud computing location and other is offered kinds of
service. Some of the key techniques that is utilised in cloud computing are data storage
technology, technology for data management and task scheduling model as well as
programming model.
Why Big Data in cloud
In big data, there is manipulation of petabytes of data. Such a huge amount of data can be
easily managed in measurable environment of cloud hence making it likely to transport data-
intensive applications which powers business analytics. Major benefit of using cloud is that it
simplifies collaboration and connectivity within a firm. This allows workers to have access to
appropriate analytics and hence rationalizes data sharing. IT leaders can easily understand the
benefits of placing big data within the cloud (Chan, 2018). It might not be so simple for
normal executives or any other kind of stakeholders. Big data in cloud computing provides
executives with better way to boost decision making that is data driven. By effectively
tracking defects and optimisation of supply chain can be made easier with assistance to
material data on hand. With the help of data customer engagement and loyalty can be
Document Page
3
improved. It also ensures that new opportunities for cost reduction, strategic investments and
revenue growth can be ensured. Big data accompanied with an flexible cloud platform might
have impact on the ways in which a firm does its business and achieve the objectives of the
firm.
Techniques used in Cloud Computing
Different companies use different cloud computing techniques so as to manage the amount of
data that is being generated in the workplace. Some of the key techniques that are used by the
companies have been described below:
Google File system or GFS: It is a proprietary distributed file system created by Google Inc.
It is created for providing dependable and efficient data access utilising large collection of
commodity hardware. GFS is optimised for primary data storage of Google and practise
requirements specifically the search engine, this generates a huge amount of data that needs
to be reserved. Larger files are divided into chunks of 64 megabytes which are rarely
overwritten or shrunk. A higher precaution should be taken in contrast to the higher failure
rate of individual nodes and also for successive loss of data (Agrawal, Das & El Abbadi,
2011). The nodes are bifurcated into two categories one is master node and greater number of
Chunkservers. These Chunkserves have data into 64MB which is like sectors or clusters in
any regular file systems. Each chunk gets replicated multiple times all across the network
with lowest numbers touching to three. It requires more redundancy.
On the other hand master server do not regularly stock the actual chunks, instead all the
metadata linked with chunks like the tables mapping the 64-bit labels to chunk locations and
the files they make up and the location of the copies of chunks. This entire metadata is kept
recent by this server. There is periodic receiving of updates from each chunk server.
Modification permission is taken care by a system of time-constrained, expiring leases where
the master server provides permissions to a method for a limited timeframe. In the meanwhile
no other processes are given the permission by master server to do chunk modification.
Chunks are accessed by programs by first querying the master server for the positions of the
anticipated chunks. If the chunks are not being operated on then Master answers with position
and the program then contacts and receives the data from the chunkserver directly. This is
different other file systems as GFS is not applied in the operating system’s kernel but it is
otherwise given as a user space library (Mosco, 2015).
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
4
Big Table: A Bigtable is a distributed, sparse, persistent multi-dimensional sorted map. The
map is numbered through a row key, timestamp and column key,; each value in the map in an
uninterpreted bytes array. Bigtable depend on highly available and persistent distributed lock
service known as Chubby. A Chubby service comprises of five active duplications of which
one becomes master and dynamically serve requests (Talia, 2013). The service is live when
most of the replicas are running and can communicate with each other. For several types of
tasks Bigtable uses Chubby; to make sure that there is at most one dynamic master at any
time; to stock the bootstrap position data related to Bigtable; to find out tablet servers and
finalise tablet server deaths; to stock Bigtable schema information and also to stock access
control lists. Every table consists of multiple dimensions. In this the first level comprises of a
file that is collected in Chubby which comprises of root tablet’s position (Ji, Li, Qiu, Awada
& Li, 2012). The root tablet comprises of position of all tablets in a different METADATA
table. Each METADATA tablet comprises the position of a group of user tablets. This root
tablet is known to be the first tablet in the METADATA table, but a different type of action is
given. It never gets bifurcated to ensure that the hierarchy of tablet has not been higher than
three levels.
Map-Reduce Programming Model: Map-Reduce are used for supporting spread computing
on the large data sets on computer’s collections. It is generally is encouraged by map and
reduce functions mostly utilised in functional programming. However their aim in the Map-
reduce framework is not similar as it is in their primary forms. Map-reduce are a basis for
processing large datasets on few types of distributable issues utilising a huge numbers of
nodes known as cluster. Computational processing might arise on stored data either in
unstructured or within a structured. In the map step, the master node grabs the inputs, cuts it
up into minor sub-problems and dispenses it to worker nodes (Hashem, Yaqoob, Anuar,
Mokhtar, Gani & Khan, 2015). A worker node might do this once more in turn which hints to
multi-level tree structure. Smaller problem is processed by worker node and revert the
response to its master node. In the reduce step, the master node uitlises answers to all the sub-
issues and accumulates it in a manner to gain the output.
These are three major techniques that are used by Google in its cloud computing
infrastructure and it is key major techniques also. There are many other cloud service
providers also which uses some other techniques but these are the major three techniques that
is used by most of the organisations. All these techniques are capable of handling Big Data
which is unable to manage through traditional data management software (Yang, Huang, Li,
Document Page
5
Liu & Hu, 2017). Researchers are trying hard to maximise the potential of cloud computing
so that the size of big data that can be handled with the help of cloud computing can be
managed. For this a better methods needs to be found out so as to process and analyse this
data that uniquely requires up gradation in the infrastructure.
It’s not that there are all roses in the big data within cloud. There are different kinds of
hurdles when big data is migrated to the cloud network. Few primary issues in the big data
within cloud are:
Lesser control over security: Such huge datasets more often comprises of sensitive
information like addresses of individual, facts related to credit cards, numbers related to
social security and other kinds of information. It is of higher importance that data is kept
protected. Breach of data can lead to some serious kinds of penalties under different kinds of
regulations and it might also have impact on the brand image of the company which might
lead to loss of revenue or customers. A company need to ensure that security might not act as
a hindrance to migrating to the cloud network. A firm must ensure that there is less direct
control over the data of the firm (Fernández, del Río, López, Bawakid, del Jesus, Benítez &
Herrera, 2014). This might be a big organisational change and might cause some kinds of
discomfort to the people. In order to deal with these problems a company must be sure that
they sensibly assess the security protocols and hence know the shared responsibility model of
the cloud service provider. This will ensure that everyone knows their obligations and roles in
the cloud infrastructure.
Lesser control over compliances: Compliance is understood as another major issue that
users might face while moving data to cloud. Service providers manage a specific compliance
levels with different kinds of regulations like PCI, HIPAA and numerous others. Same as in
the case of security one cannot have control over the data compliance needs (Xiaofeng &
Xiang, 2013).
Network dependency and latency challenges: Easy accessibility of data is possible because
the data is available on network connection. Such kind of dependency on the internet
suggests that the system might be susceptible to service interruptions. There might also be an
issue related to latency in the cloud environment with the data volume that is being analysed,
moved and treated at any point of time (Dialani, 2019).
Document Page
6
Possible future developments
Cloud computing is future of the IT industry. Hence researchers are able to suggest some of
the possible future directions. These are:
Edge computing: As firms needs near-instant access to data and computes resources to serve
their clients, they are looking for edge computing (Assunção, Calheiros, Bianchi, Netto &
Buyya, 2015). Edge computing organizes certain kind of computing processes from data
centres that are centralised to place system closer to the users, sensors and gadgets. This is
highly beneficial in the case of IoT with its prerequisites to collect and process large volume
of information in the real-time with inactivity level being low.
Quantum Computing: In the IT industry cloud computing is going to be the future. The data
might be processed at faster rate with the help servers and quantum computers. This is able to
reduce the amount of power that is being utilised in the computing task (Purcell, 2014).
Server less computing: Becoming server less requires creation paradigm and also update in
the current technology. The entry of open source solutions in this space will fasten the speed
and improve the gathering of applications of server less computing throughout different
sectors (Gupta, Gupta & Mohania, 2012).
Conclusion
From the above based report, a conclusion can be drawn that Cloud computing provides an
appropriate solution to handling of big data. Some of the key techniques that is used in the
cloud computing are Google File system or GFS, Big Table, Map-Reduce Programming
Model. Along with this there are several issues present in the cloud computing as a big data
solution. Lesser control over security, lesser control over compliances and network
dependency and latency challenges are the three major challenges that confronts the usage of
cloud as a solution to big data. Edge computing, quantum computing and server less
computing are the possible trends in the future where the cloud development will occur.
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
7
References
Agrawal, D., Das, S., & El Abbadi, A. (2011, March). Big data and cloud computing: current
state and future opportunities. In Proceedings of the 14th International Conference on
Extending Database Technology (pp. 530-533). ACM.
Antonopoulos, N., & Gillam, L. (2010). Cloud computing. London: Springer.
Assunção, M. D., Calheiros, R. N., Bianchi, S., Netto, M. A., & Buyya, R. (2015). Big Data
computing and clouds: Trends and future directions. Journal of Parallel and
Distributed Computing, 79, 3-15.
Chan, M. (2018) Big Data in the Cloud: Why Cloud Computing is the Answer to Your Big
Data Initiatives. Retrieved from: https://www.thorntech.com/2018/09/big-data-in-the-
cloud/
Dialani, P. (2019) THE FUTURE OF CLOUD IN 2020. Retrieved from:
https://www.analyticsinsight.net/the-future-of-cloud-in-2020/
Fernández, A., del Río, S., López, V., Bawakid, A., del Jesus, M. J., Benítez, J. M., &
Herrera, F. (2014). Big Data with Cloud Computing: an insight on the computing
environment, MapReduce, and programming frameworks. Wiley Interdisciplinary
Reviews: Data Mining and Knowledge Discovery, 4(5), 380-409.
Gupta, R., Gupta, H., & Mohania, M. (2012, December). Cloud computing and big data
analytics: what is new from databases perspective?. In International Conference on
Big Data Analytics (pp. 42-61). Springer, Berlin, Heidelberg.
Hashem, I. A. T., Yaqoob, I., Anuar, N. B., Mokhtar, S., Gani, A., & Khan, S. U. (2015). The
rise of “big data” on cloud computing: Review and open research issues. Information
systems, 47, 98-115.
Ji, C., Li, Y., Qiu, W., Awada, U., & Li, K. (2012, December). Big data processing in cloud
computing environments. In 2012 12th international symposium on pervasive
systems, algorithms and networks (pp. 17-23). IEEE.
Mosco, V. (2015). To the cloud: Big data in a turbulent world. Routledge.
Purcell, B. M. (2014). Big data using cloud computing. Journal of Technology Research, 5, 1.
Document Page
8
Sookhak, M., Gani, A., Khan, M. K., & Buyya, R. (2015). Dynamic remote data auditing for
securing big data storage in cloud computing (Doctoral dissertation, Fakulti Sains
Komputer dan Teknologi Maklumat, Universiti Malaya).
Talia, D. (2013). Clouds for scalable big data analytics. Computer, 46(5), 98-101.
Xiaofeng, M., & Xiang, C. (2013). Big data management: concepts, techniques and
challenges [J]. Journal of computer research and development, 1(98), 146-169.
Yang, C., Huang, Q., Li, Z., Liu, K., & Hu, F. (2017). Big Data and cloud computing:
innovation opportunities and challenges. International Journal of Digital
Earth, 10(1), 13-53.
chevron_up_icon
1 out of 9
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]