logo

Extraction Transformation Loading (ETL)

   

Added on  2022-08-12

40 Pages4377 Words29 Views
ETL
Steel L.C. Database
[Type the author name]
[Pick the date]

P a g e | 2
Introduction
ETL process is vital for data warehousing because the data tables are built from stage
areas into data marts, cubes or summary tables. The ETL procedure completes their
data warehousing project. Minus the ETL method, there would not be a way to use data
for BI or advance significant insights from all the data gathered by companies. “The
challenge in data warehouse environments is to integrate, rearrange as well as
consolidate large volumes of data over many systems, thereby providing a new unified
information base for business intelligence (Lane, 2005.).”
Analysis as well as Answers to all the 4 questions
ETL process has three steps: 1. Extract 2. Transform as well as 3. Load. Extraction
consists of valuable data being identified as well as extracted from several sources.
There have been some cases when the useful date was not identified instantly. The
information that was obtained without initial desirability. The second step is extracted
data is transformed into a structured setup may be predefined by DW format. Loading is
when the structured information is loaded into an appropriate table in the DW.
The four categories of ETETL L technologies are Sophisticated, Enabler, Simpler as
well as rudimentary as defined by Solomon (2006). All of these technologies have
advantages as well as disadvantages that should be careful before using them on a DW
project. Example a sophisticated technology will have a high cost, as well as above
average learning curve, it does have good meta-data capabilities. A combination of
enabler, more straightforward as well as necessary techniques can be bought at a low
cost, but DWA’s will need to continue meta-data diligently.
ETL PROCESS 3
Example of ETL technologies where: ETL, Informatica (sophisticated0, data junction
(enabler), Transact SQL simpler, Java, as well as COBOL (rudimentary). (Solomon)
Today, available ETL tools have improved, as well as a nearly free option was available
to incorporate: ApachNifi, Jasper as well as Pentaho Data combination. The commercial
decision includes Confluent, Oracle Data integrator, SAS ETL as well as Alooma. ETL

P a g e | 3
Tools suggest “Begin with defining the business requirements, then consider the
technical aspects, as well as finally, choose the right ETL tool (n.d.),”
ETL stands for extraction,transformation,as well as loading of data as it name suggests
three important word which in itself is a process,i.e. to form data warehousing you need
to extract data from either spreadsheet or other data. It is not only confined to
this,butthe ETL process also plays a vital role in the updating of real-time data
warehousing efforts depending upon the design of the ETL(Berliantara, 2017).
There are three steps of the ETL process which are as follows,i.e.,Extraction,
Transformation,as well as Loading.

P a g e | 4
Extraction
Extraction step of the ETL process connects with source data as well as then collects
necessary data as well as then select the data for further analytical processing. The
main aim of this process is to collect data from different sources.
Transformation
Transformation Process is the second step of the ETL process. The primary objective of
this step is to transform extracted data into the specified standard format by applying
specific rules or functions.
Loading
The next phase of the ETL process is loading that includes transferring the extracted
data or data transformed. These data are transferred or imported to database targeted
or data warehouse.
The most common processes used for these steps are filtering, sorting standardizing,
etc.
The four categories of ETL technologies are as follows:
Cost Control
The primary costs in ETL technologies will constitute hardware, software, support,
consulting, people as well as cloud. The ETL Technologies manufacturer will have to
keep cost as well as benefit analysis in order to explore in the industry(B., M. & A.,
2015).
Revenue Generation
ETL technologies have played a vital role in the driving of revenue generation of an
organization.
Security as well as compliance

P a g e | 5
For attaining superiority you have to take extreme care of the security of data, the ETL
technologies had taken extreme measure in about the security of the data.
New initiatives
ETL technologies must continue to educate as well as re-educate themselves on
technology as well as consumer trends in order to stay in the market.
Examples of ETL technologies are as follows:
1) Informatica – Power Center.
2) IBM – Infosphere Information Server.
3) Oracle Data Integrator.
4) Microsoft – SQL Server Integrated Services (SSIS)
5) Ab Initio.
6) Talend – Talend Open Studio for Data Integration.
7) CloverETL.
8) Pentaho Data Integration
ETL Plan
ETL stands for Extract-Transform-Load. ETL covers a process of how the data is loaded
from the source system to the data warehouse.
Extract
The extraction step covers data extraction from the source system and makes it
accessible for further processing. The primary purpose of the extraction step is to
retrieve all the necessary data from the source system with as few resources as
possible. Extraction steps should be designed to not adversely affect the source system
for performance, response time, or any type of lock.

P a g e | 6
Cleaning
The cleaning step is one of the most important steps to ensure the quality of the data in
the data warehouse. Cleaning ensures that identifiers are unique, null values are
converted to standardized, unavailable/not provided values, phone numbers are
converted to standardized format, address fields are validated against each other
(state/country, city/state, City / ZIP code, City / Street).
Tranform
This includes a set of rules for converting data from source to target. This includes
converting data to an Fit dimension using the same units so that it can be joined later.
This includes joining data from many sources, generating aggregation, surrogate keys,
sorting, and deriving calculated values.
Load
It is essential to properly load with as few resources as possible. To make the load
process more efficient, you can disable constraints and indexes before loading, and
then enable them only after loading. Referential integrity must be maintained by the ETL
tool to ensure consistency.

P a g e | 7
i. Target Data needed in the Data Warehouse
In our data warehouse we have two data marts, one is customer count as well as other
is product sales. In the target data warehouse on the whole for these two data marts we
needed five tables. Those are three dimension tables product dimension, store
dimension as well as time dimension, as well as two fact tables Product Sales fact as
well as Product Category fact
Target Table Columns Data type
DIM_STORE STORE_KEY int
NAME varchar(50)
CITY varchar(50)
STORE varchar(50)
Target
Table
Columns Data type
DIM_TIME Time_key int
WeekNumber int
Start varchar(50)

P a g e | 8
End varchar(50)
Special Events varchar(50)
Target Table Columns Data type
DIM_PRODUCT Product_key int
ProductID varchar(50)
Category varchar(50)
Description varchar(50)
Target
Table
Columns Data type
Product Category varchar(50)
Sales float
STORE int
WEEK int
Target Columns Data type

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Data Analytics and Business Analytics: ETL for Data Warehouse Construction
|21
|5842
|355

SAP Implementation for Global Bicycles Corporation
|32
|3975
|145

Data merging and Cleaning Assignment PDF
|10
|2051
|29

Data and Information Management
|17
|2879
|72

Trends of Business Intelligence - PDF
|7
|1673
|79

Database Application Direction | Data Warehouse
|5
|799
|22