logo

Hadoop in Big Data Analysis

   

Added on  2023-03-17

5 Pages958 Words38 Views
Running head: HADOOP IN BIG DATA ANALYSIS
Hadoop in Big Data analysis
Name of the Student
Name of the University
Authors note

1HADOOP IN BIG DATA ANALYSIS
Introduction
The Hadoop can be defined as the framework that helps in the analysis of large
dataset through the use of the cluster of computers that uses the Map-Reduce
Technique. This is developed on the Apache open source framework in java. The most
important aspect of Hadoop is; it divides the complete processing among the cluster of
computers are in a distributed environment which makes it efficient compared to other
analysis frameworks.
Hadoop and its components and way of analysis
The complete framework is developed upon the three main components which
are storage known as HDFS, Hadoop YARN and finally Hadoop Map-Reduce which is
the application layer.
The HDFS maintains the master slave topology for managing the distribution of
the storage. Two daemons which are namely, NameNode and Data Node. The
NameNode is executed on the master system and Data Node is executed on the slave
systems.
The NameNode stores the, file tree that helps in accessibility to the stored files in
the Data Node. After every start-up of the system, Data Nodes connects to the Name
Node. Later on, it frequently refers to the NameNode according to the Requests.
The stored data is again replicated to the, different blocks in the cluster of the
computer systems. The replication factor used can heavily impact on the reliability and
performance of the Hadoop Systems.
For the Map-Reduce layer, it can be stated that this an algorithm and helps in the
parallel processing of the data on the cluster of the systems. The term Map-Reduce
consist of the two tasks that are Map and Reduce (Gates, Alan and Daniel, p.26). In the
Map task, the input is converted into the set of data, in which individual element of the
data are divided in tuples. In the next stage, as Hadoop can process arbitrary form of
data thus, this process is completed by “Record Reader” and “Input Format”. Through
the use of the record reader raw data is converted into key value pair. This Map-Reduce
is converts and processes in highly fault tolerant and resilient manner.
The Yarn Layer is responsible for the analysis job scheduling and resources
management in different daemons. There are other components such as Application

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Hadoop File System - Architecture, Forensic Investigation
|10
|1804
|404

Assignment on Big data PDF
|34
|5657
|33

What is Hadoop? How does it work?
|4
|1073
|298

Database Technologies Assignment 1
|11
|1115
|70

Big Data Analysis: Understanding the 3 V's of Big Data and Its Significance
|1
|630
|362

APACHE HADOOP VERSUS APACHE SPARK.
|63
|15956
|48