logo

Design and Implementation of a Data Warehouse for GreenHomeHelp

   

Added on  2022-12-03

11 Pages1392 Words73 Views
Introduction
Modern businesses are largely dependent on data and information insights. Governments around
the world are also adopting data and information driven policy making processes. To effectively
retrieve information from operational data, data warehouses are employed; as they ease and
streamline the retrieval process.
This report presents the design and implementation of a data warehouse for GreenHomeHelp.
The warehouse was developed on an Oracle database, with data extracted from the given MS
Access database. The data was first extracted from Access through exporting to a flat file, and
then imported into oracle staging area by use of a data loader. Cleaning and transformation then
followed, before the data was loaded into the data mart.
Design Overview
Since the data warehouse design focuses on the 5 critical questions that the government want
answers to; the warehouse only contains tables with the relevant information for the given
scenario. A simplified design of the warehouse is as shown below.
Design and Implementation of a Data Warehouse for GreenHomeHelp_1
ETL: Extraction, Transformation and Loading
Data from the access database was first exported into a comma delimited text file, to facilitate
loading into oracle by use of a data loader. All the tables from the access database were exported
and then loaded into a staging area.
Figure 1.0 Exporting MS Access data into a flat file.
Loading into staging area
Control files were developed for each flat file that represents each table. The control files were
used to load data into the staging tables; a sample control file is as shown below;
LOAD DATA
INFILE*
INTO TABLE EMPLOYEE_TITLE
FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"'
(TITLEID,TITLEDESCRIPTION)
BEGINDATA
1,"Dr"
2,"Mr"
Design and Implementation of a Data Warehouse for GreenHomeHelp_2
3,"Miss"
4,"Ms"
5,"Prof"
6,"Mrs"
Figure 2.0 Control file for loading data into the staging area
Figure 3.0 Loading data using SQL Loader; each control file was loaded into the staging area
using the SQLLOADER.
Figure 4.0 Confirmation that the data was successfully loaded into the table on the staging area.
Design and Implementation of a Data Warehouse for GreenHomeHelp_3
Figure 5.0 Staging Area and the warehouse
Data Cleaning
To ensure consistency and quality of data, a thorough data cleaning exercise was undertaken. The
following approaches were used in the cleaning exercise;
Eliminating unwanted hyphens in strings, especially on names and addresses;
Extracting month and week from date and storing it in a column of its own
Eliminating unwanted characters in names such as continuous dots
Data Cleaning SQL Statements
UPDATE `temp2` SET `First_Name` = REPLACE(First_Name, '.', '')
UPDATE `temp2` SET `Last_Name` = REPLACE(`Last_Name`, '.', '')
UPDATE `temp2` SET `Address_Line1` = REPLACE(`Address_Line1`, '\'', '')
UPDATE `temp2` SET `Address_Line2` = REPLACE(`Address_Line2`, '\'', '')
Data Loading into the Warehouse
To load data into the warehouse from the staging area, data dumps were generated using plsql as
shown in the image below; select queries were used to select the appropriate fields to load into
the data warehouse.
Design and Implementation of a Data Warehouse for GreenHomeHelp_4

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Data Warehouse Components & Structures : Assignment
|16
|4219
|143