Hadoop in Big Data Analysis

Verified

Added on  2023/03/17

|5
|958
|38
AI Summary
This article provides an overview of Hadoop and its components, including HDFS, YARN, and Map-Reduce. It discusses the advantages of using Hadoop for big data analysis, such as scalability and flexibility, as well as the disadvantages, such as slow processing speed and complexity. The article concludes by highlighting the importance of considering the advantages and drawbacks when using Hadoop for data analysis.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
Running head: HADOOP IN BIG DATA ANALYSIS
Hadoop in Big Data analysis
Name of the Student
Name of the University
Authors note

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
1HADOOP IN BIG DATA ANALYSIS
Introduction
The Hadoop can be defined as the framework that helps in the analysis of large
dataset through the use of the cluster of computers that uses the Map-Reduce
Technique. This is developed on the Apache open source framework in java. The most
important aspect of Hadoop is; it divides the complete processing among the cluster of
computers are in a distributed environment which makes it efficient compared to other
analysis frameworks.
Hadoop and its components and way of analysis
The complete framework is developed upon the three main components which
are storage known as HDFS, Hadoop YARN and finally Hadoop Map-Reduce which is
the application layer.
The HDFS maintains the master slave topology for managing the distribution of
the storage. Two daemons which are namely, NameNode and Data Node. The
NameNode is executed on the master system and Data Node is executed on the slave
systems.
The NameNode stores the, file tree that helps in accessibility to the stored files in
the Data Node. After every start-up of the system, Data Nodes connects to the Name
Node. Later on, it frequently refers to the NameNode according to the Requests.
The stored data is again replicated to the, different blocks in the cluster of the
computer systems. The replication factor used can heavily impact on the reliability and
performance of the Hadoop Systems.
For the Map-Reduce layer, it can be stated that this an algorithm and helps in the
parallel processing of the data on the cluster of the systems. The term Map-Reduce
consist of the two tasks that are Map and Reduce (Gates, Alan and Daniel, p.26). In the
Map task, the input is converted into the set of data, in which individual element of the
data are divided in tuples. In the next stage, as Hadoop can process arbitrary form of
data thus, this process is completed by “Record Reader” and “Input Format”. Through
the use of the record reader raw data is converted into key value pair. This Map-Reduce
is converts and processes in highly fault tolerant and resilient manner.
The Yarn Layer is responsible for the analysis job scheduling and resources
management in different daemons. There are other components such as Application
Document Page
2HADOOP IN BIG DATA ANALYSIS
manager, scheduler, Application master which are responsible for the Accepting the
submitted jobs from the client. Negotaites for ApplicationMaster as well as restarting
containers after any kind of application failure. For the scheduler, it is responsible for the
allocation of resources to the available application that are executing at the given point
of time.
Advantages and disadvantages
Advantages
Scalability: In case of data analysis the storage is considered as one most important
aspect which is solved by Hadoop. It provides scalable storage platform for the analysis
of the data to be analysed. Hadoop helps in storing and distribute humongous data sets
in hundreds of inexpensive servers and operates o them in a parallel manner.
Flexible: Moreover, Hadoop is helpful for the organization in order to easily access
different new data sources as well as get into those data whether
structured/unstructured in order to gain insights and value from it.
Cost effective: In order to process tremendous amount of data generated in real time,
Traditional data analysis require large data storage which is not required in Hadoop. In
Hadoop, it is possible to analyse the data coming from social media platform, mail boxes
and so on (Gates, Alan and Daniel, p.35).
Disadvantages
Slow processing speed: As the taken dataset needs to be passed through the
Map and Reduce processes in order to extract values from the selected data set. This
tasks have high rate of latency compared to the traditional data analysis tools and
technique.
Complexity in use: The Hadoop is not that much user friendly it requires large
and custom made code for every operation that leads to the analysis and extraction and
value from the selected dataset.
Only support to batch Processing: Hadoop is only helpful for the batch
processing of the data and not too much helpful in case of the streamed data (Landset,
Sara, et al., p.10). In this scenario, the memory in the cluster is not used at its maximum
capacity. Moreover, the processing overhead for read write operations is another issue
that makes a hindrance in the use of the tool.
Document Page
3HADOOP IN BIG DATA ANALYSIS
Conclusion
Every data analysis solution comes with the its own drawbacks as well as
advantages. Thus, organizations can use the Hadoop solution for data analysis while
exploiting the advantages and minimizing the impact of the drawbacks such as latency.

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
4HADOOP IN BIG DATA ANALYSIS
References
Gates, Alan, and Daniel Dai. Programming pig: Dataflow scripting with hadoop. " O'Reilly Media, Inc.",
2016.
Landset, Sara, et al. "A survey of open source tools for machine learning with big data in the Hadoop
ecosystem." Journal of Big Data 2.1 (2015): 24.
"Apache Hadoop." Hadoop.apache.org. N.p., 2019. Web. 10 May 2019.
1 out of 5
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]