logo

Developing a Proof of Concept Data Warehouse/Data Mart using Dimensional Modelling

   

Added on  2022-11-16

11 Pages2251 Words475 Views
1.
Develop a proof of concept data warehouse/data Mart using dimensional modelling
by capturing data from an existing data source(s). Preferably do not use North-wind
database as it will be used for demonstration purpose during lectures. Documents
your reasons for selecting the subject area(s), identify key stakeholders,formalise
the vision and goals and requirements for developing the data warehouse.
2.
Develop and present a suitable schema for data warehouse. Discuss your reasons
for the design.
3.
Using Microsoft SQL Server, implement your tables and extract transform and load
data from the operational source(s) into the data warehouse. This can be done
using any available tool such as SSIS or by writing SQL statements.
Ans:
1.
A dimensional model is a data structure technique optimized for Data warehousing
tools.
A Dimensional model is designed to read, summarize, analyze numeric information
like values, balances, counts, weights, etc. in a data warehouse.
Relation models are optimized for addition, updating and deletion of data in a real-
time System.
These dimensional and relational models have their unique way of data storage that
has specific advantages.
In the relational mode, normalization and ER models reduce redundancy in data.
On the contrary, dimensional model arranges data in such a way that it is easier to
retrieve information and generate reports.
Hence, Dimensional models are used in data warehouse systems and not a good fit
for relational systems.
A data mart is focused on a single functional area of an organization and contains a
subset of data stored in a Data Warehouse.
A data mart is a condensed version of Data Warehouse and is designed for use by a
specific department, unit or set of users in an organization.

E.g., Marketing, Sales, HR or finance.
It is often controlled by a single department in an organization.
Data Mart usually draws data from only a few sources compared to a Data
warehouse.
Data marts are small in size and are more flexible compared to a Datawarehouse.
The data model of the classical data warehouse (formally, dimensional model) does
not offer comprehensive support for temporal data management.
The underlying reason is that it requires consideration of several temporal aspects,
which involve various time stamps.
Also, transactional systems, which serves as a data source for data warehouse,
have the tendency to change themselves due to changing business requirements.
The classical dimensional model is deficient in handling changes to transaction
sources.
This has led to the development of various schemes, including evolution of data and
evolution of data model and versioning of dimensional model.
These models have their own strengths and limitations, but none fully satisfies the
above-stated broad range of aspects, making it difficult to compare the proposed
schemes with one another.
The schemes that satisfy such challenging aspects faced by a data warehouse and
proposes taxonomy for characterizing the existing models to temporal data
management in data warehouse.
A data warehouse (DWH) is a system used to store information for use in data
analysis and reporting.
Data marts are areas of a data warehouses used to store information needed by a
single department or even by an individual user.
All relevant data is stored inside the company DWH.
Most users, however, only need to access certain subsets of data, like those relating
to sales, production, logistics or marketing.
Data marts are important from both a security standpoint and from a user
standpoint.
There are two different approaches to the data warehouse-data mart relationship:
Top-Down: Data marts are created from the data warehouse.

Bottom-Up: Data marts are created first, then combined into a data warehouse.
2.
The star schema is the simplest model used in DWH.
Because the fact table is in the center of the schema with dimension tables around
it, it looks roughly like a star.
Th is especially apparent when the fact table is surrounded by five dimension
tables.
A variant of the star schema the centipede schema, where the fact table is
surrounded by a large number of small dimension tables.
Star schemas are very commonly used in data marts. We can relate them to the
top-down data model approach.
Let's analyze two star schemas (data marts) and then combine them to make a
single model.
Star Schema Example: Sales
The sales report is one today’s most common reports.
In most cases we could generate sales reports from the live system.
But when data or business size makes this too cumbersome, we’ll have to build a
data warehouse or a data mart to streamline the process.
After designing our star schema, an ETL process will get the data from operational
database(s), transform the data into the proper format for the DWH, and load the
data into the warehouse.
The model presented above contains of one fact table (colored light red) and five
dimension tables (colored light blue). The tables in the model are:
fact_sales – This table contains references to the dimension tables plus two facts
(price and quantity sold). Note that all five foreign keys together form the primary
key of the table.
dim_sales_type – This is a sales-type dimension table with only one attribute,
“type_name”.
dim_employee – This is an employee dimension table that stores basic employee
attributes: full name and birth year.

dim_product – This is a product dimension table with only two attributes (other than
the primary key): product name and product type.
dim_time – This table handles the time dimension.
It contains five attributes besides the primary key.
The lowest-level data is sales by date (action_date).
The action_week attribute is the number of the week in that year (i.e. the first week
in January would be given the number 1; the last week in December would get the
number 52, etc.)
The actual_month and actual_year attributes store the calendar month and year
when the sale occurred.
These can be extracted from the action_date attribute.
The action_weekday attribute stores the name of the day when the sale took place.
dim_store – This is a store dimension. For each store we’ll save the city, region,
state and country where it is located.
Star Schema Example: Supply Orders
There are a lot of similarities between this model, shown below, and the sales
model.
fact_supply_order
product_id
time_id
supplier_id
employee_id
price
quantity
int
int
int
int
decimal(8,2)

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Data Warehouses vs Data Marts
|4
|608
|272

Logical Modeling and Design: Assignment
|25
|3038
|492

Data Storage Solutions for Data Analytics
|5
|670
|258

CS5504: Business Intelligence Systems, Infrastructures and Technologies Assignment 2022
|13
|1282
|10

Data Models and Database Technologies: A Critical Comparison
|12
|1788
|219