logo

Data Lake Architecture: Components and Architecture

Explore the management of big data using Data Lake technology and provide a Design Patterns for organizing a collection of datasets.

10 Pages2134 Words172 Views
   

Added on  2022-10-01

About This Document

This article discusses the components and architecture of data lake, including data ingestion, organisation, security and governance, indexing and search, analytics, and visualisation. It also proposes a data lake architecture for efficient operation.

Data Lake Architecture: Components and Architecture

Explore the management of big data using Data Lake technology and provide a Design Patterns for organizing a collection of datasets.

   Added on 2022-10-01

ShareRelated Documents
Running head: DATA LAKE ARCHITECTURE
DATA LAKE ARCHITECTURE
Name of student
Name of university
Author’s note:
Data Lake Architecture: Components and Architecture_1
1
DATA LAKE ARCHITECTURE
Table of Contents
Part 1: Data lake components.........................................................................................2
Data ingestion component..........................................................................................2
Data organisation component.....................................................................................3
Data security and governance component..................................................................3
Indexing and search component.................................................................................4
Analytics component..................................................................................................4
Visualisation component............................................................................................5
Part 2. Data Lake Architecture.......................................................................................6
References......................................................................................................................7
Data Lake Architecture: Components and Architecture_2
2
DATA LAKE ARCHITECTURE
Part 1: Data lake components
Data ingestion component
Data types: There are two varieties of data that are accessible in the present
technological world, namely unstructured and the structured data. The structured data could
be described as the data that has been organised into the formatted depository, commonly the
database, for allowing the elements to be significantly addressable for efficient processing as
well as analysis (Henaff, Bruna and LeCun 2015). Unstructured data could be described as
the information who does not have any pre-determined data model or has not been structured
in the pre-determined manner. The data could be effectively streamed in the real time or it
could ingested in batches. Data ingestion could be described as procedure of the introducing
after gathering data for any immediate use or storing in database. When the data has been
introduced in real time, the importing of data could be done exactly as it has been transmitted
by source. When the data has been ingested in batches, the importing of data items could be
done in significantly distinct chunks at the periodic time intervals (Yao and Van Durme
2014).
Hortonworks dataflow: The Hortonworks dataflow could be referred as scalable,
analytics platform working with real-time streaming which intakes, analyses as well as
curates the data for the crucial insights as well as crucial intelligence that is gained
immediately. The dataflow mainly addresses the crucial challenges that are faced by the
organisation with the data stream processing in the real time of the data at significantly high
scale and high volume, the data ingestion and the provenance from the IoT devices, the
streaming sources and the edge applications (Gates and Dai 2016). It drastically reduces the
development time of the data integration.
Data Lake Architecture: Components and Architecture_3

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Business Intelligence Using Big Data
|16
|4212
|71