Dimensional Modeling Case Study
VerifiedAdded on 2019/10/18
|25
|3038
|492
Report
AI Summary
This assignment requires students to create a fully documented dimensional model for their term project, using practical examples from industry and professional literature. The course includes 17 case studies on different domains such as retail, procurement, order management, CRM, accounting, HR, banking, telecom, transportation, university, health care, electronic commerce, insurance, and production. Students are expected to apply dimensional modeling concepts, including degenerate dimensions, junk dimensions, fact-less fact tables, snapshots, bridge dimensions, secondary dimensions, and views for materialization. The assignment aims to provide broad and realistic exposure to practical details of DW design, equivalent to an industrial-type course.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
Lecture Notes- Module 4 Logical Modeling and Design
What is LM&D and what we will cover in this module?
Logical Modeling and Design of Data Warehouses is centered on the analysis and creation of an
implementation neutral representation for a future DW known as Dimensional Model. Technically
LM&D can be aligned with relational model (RM), multidimensional model (MM) or a hybrid model, a
combination of RM and MM. For two fundamental reasons our course will be to prepare DW design for
relational model implementation: a) educational benefits of building on theoretically grounded and well
standardized technology (our students are familiar with RM theory and the SQL from a pre-requisite
database course), and b) relational implementation scales much better (due to avoiding sparsity and to
stability/services of a proven mature relational database technology). Furthermore, evolution of relational
DW implementation technologies exploits data sourcing from RDBMS and on the other hand RDBMS
engines evolution itself incorporates successful concepts from multidimensional technology (both Oracle
and MS SQL Server included MM capabilities into their DB engines).
The textbook coverage (from chapters 8 and 9) provides excellent background with focus on transitioning
from DFM to Dimensional Models (DM) and grounded treatment of aggregate views (materialized views
candidates) and will be further supported in our lectures and assignments by industrial strength models
and experiences (including extensive case studies presented among the additional M4 readings)
following insights primarily from the Kimball’s The Data Warehouse Toolkit”-2ed and the Adamson’s
“Mastering Warehouse Aggregates” so that substantial value is added in terms of realistic examples. In
terms of visual representation our practice will include Dimensional Model diagrams and visualization
(specification of) logical views, both using the case tool ERwin. This module covers both Logical
Modeling and Logical Design for DW strictly in a relational implementation setting. Before outlining
steps and techniques to be elaborated a few terminology/definitional notes are in place.
The multidimensional modeling in relational systems is based on a star-schema and variations known as
snowflake and constellation (multiple-stars sharing dimensions) schemas. See definitions from the
textbook’s pp 221 and on.
Star schema consists of a fact table (typically with a set of measurement attributes) and a set of
corresponding dimensional tables referencing the fact table. A multidimensional view of data is obtained
by combining (SQL joins) fact table with related dimensional tables.
Snowflake schema involves at least partially normalized dimensions, and a constellation schema involves
several starts (facts tables). An outrigger is a secondary dimension not connected to any fact entities.
Logical Modeling
Logical modeling is covered by the textbook’s chapter 8 and elements from additional examples,
industrial case studies, from supplemented cases. The key point in LM with DM is a treatment of
dimensional dynamics (types of changes and modeling alternatives to support various temporal
scenarios). Additional practical solutions for DFM translation into DM require mastery of star schema
model upgrades with bridges, outriggers, mini-dimensions etc. and will require not only careful reading of
ch8 but also a thorough review of selected cases relevant to your term project work.
What is LM&D and what we will cover in this module?
Logical Modeling and Design of Data Warehouses is centered on the analysis and creation of an
implementation neutral representation for a future DW known as Dimensional Model. Technically
LM&D can be aligned with relational model (RM), multidimensional model (MM) or a hybrid model, a
combination of RM and MM. For two fundamental reasons our course will be to prepare DW design for
relational model implementation: a) educational benefits of building on theoretically grounded and well
standardized technology (our students are familiar with RM theory and the SQL from a pre-requisite
database course), and b) relational implementation scales much better (due to avoiding sparsity and to
stability/services of a proven mature relational database technology). Furthermore, evolution of relational
DW implementation technologies exploits data sourcing from RDBMS and on the other hand RDBMS
engines evolution itself incorporates successful concepts from multidimensional technology (both Oracle
and MS SQL Server included MM capabilities into their DB engines).
The textbook coverage (from chapters 8 and 9) provides excellent background with focus on transitioning
from DFM to Dimensional Models (DM) and grounded treatment of aggregate views (materialized views
candidates) and will be further supported in our lectures and assignments by industrial strength models
and experiences (including extensive case studies presented among the additional M4 readings)
following insights primarily from the Kimball’s The Data Warehouse Toolkit”-2ed and the Adamson’s
“Mastering Warehouse Aggregates” so that substantial value is added in terms of realistic examples. In
terms of visual representation our practice will include Dimensional Model diagrams and visualization
(specification of) logical views, both using the case tool ERwin. This module covers both Logical
Modeling and Logical Design for DW strictly in a relational implementation setting. Before outlining
steps and techniques to be elaborated a few terminology/definitional notes are in place.
The multidimensional modeling in relational systems is based on a star-schema and variations known as
snowflake and constellation (multiple-stars sharing dimensions) schemas. See definitions from the
textbook’s pp 221 and on.
Star schema consists of a fact table (typically with a set of measurement attributes) and a set of
corresponding dimensional tables referencing the fact table. A multidimensional view of data is obtained
by combining (SQL joins) fact table with related dimensional tables.
Snowflake schema involves at least partially normalized dimensions, and a constellation schema involves
several starts (facts tables). An outrigger is a secondary dimension not connected to any fact entities.
Logical Modeling
Logical modeling is covered by the textbook’s chapter 8 and elements from additional examples,
industrial case studies, from supplemented cases. The key point in LM with DM is a treatment of
dimensional dynamics (types of changes and modeling alternatives to support various temporal
scenarios). Additional practical solutions for DFM translation into DM require mastery of star schema
model upgrades with bridges, outriggers, mini-dimensions etc. and will require not only careful reading of
ch8 but also a thorough review of selected cases relevant to your term project work.
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
An outrigger is a secondary dimension attached to a dimension, and a mini dimension is a separated
(cleaved out attributes from) a large dimension (million of records).
Before elaborating details of Dimensional Modeling and Design, let us review principal options for
logical models within relational model (ROLAP) approach and the OMA selection of MVs:
dimens ion
dimens ions
dimension
Logical Modeli ng - D M oti ons
wi thi n R DB MS (ROLAP) D Ws
STORE
storeKey
StoreDesc riptiv eAt tr
country _k ey
country _des criptiv eAtt r
region_key
region_desc riptiv eAt tr
REGION
region_key
region_desc riptiv eAt tr
COUN TR Y
country _key
country _des criptiv eAtt r
region_key (FK)
region_descriptiv eAt tr
STORE
st oreKey
StoreDesc riptiv eAt tr
country _k ey (FK)
country _des criptiv eAtt r
region_key
region_desc riptiv eAt tr
Dimensional elaborate,
easier to conform in DW
Dimensional (DW) consolidated
Traditional (DB) Normalized
STORE
st oreKey
StoreDesc riptiv eAt tr
country _k ey (FK)
COUN TR Y
country _k ey
country _des criptiv eAtt r
region_key (FK)
REGION
region_key
region_desc riptiv eAt tr
(cleaved out attributes from) a large dimension (million of records).
Before elaborating details of Dimensional Modeling and Design, let us review principal options for
logical models within relational model (ROLAP) approach and the OMA selection of MVs:
dimens ion
dimens ions
dimension
Logical Modeli ng - D M oti ons
wi thi n R DB MS (ROLAP) D Ws
STORE
storeKey
StoreDesc riptiv eAt tr
country _k ey
country _des criptiv eAtt r
region_key
region_desc riptiv eAt tr
REGION
region_key
region_desc riptiv eAt tr
COUN TR Y
country _key
country _des criptiv eAtt r
region_key (FK)
region_descriptiv eAt tr
STORE
st oreKey
StoreDesc riptiv eAt tr
country _k ey (FK)
country _des criptiv eAtt r
region_key
region_desc riptiv eAt tr
Dimensional elaborate,
easier to conform in DW
Dimensional (DW) consolidated
Traditional (DB) Normalized
STORE
st oreKey
StoreDesc riptiv eAt tr
country _k ey (FK)
COUN TR Y
country _k ey
country _des criptiv eAtt r
region_key (FK)
REGION
region_key
region_desc riptiv eAt tr
Order of Magnit ude Analy sis (OMA)
Example wit h Produc t Dimension
AllProduct s_1
Category _35
Brand_700
Product _75000
35x
20x
100x ****** bingo!
FACT
day
store country _etc _s mallerSteps
y ear_et c_smallerSteps
BrandDay Store_MV
start with 2+ OMs for ini ti al MVs
Example wit h Produc t Dimension
AllProduct s_1
Category _35
Brand_700
Product _75000
35x
20x
100x ****** bingo!
FACT
day
store country _etc _s mallerSteps
y ear_et c_smallerSteps
BrandDay Store_MV
start with 2+ OMs for ini ti al MVs
SALES
key P (FK)
key D (FK)
key S (FK)
quantity
receipts
unitPrice
numberOf Customers
DATE
key D
date
month
quarter
y ear
day
week
holiday
STORE
key S
store
storeC ity
state
country
sales Manager
sales Dis trict
PRODU CT
key P
produc t
ty pe
category
department
marketingGroup
brand
brandC ity
Logical representation using
natural keys in DW
Alternative logical representation
using convenience keys,
sometimes called DW surrogates
for all entities or a fact surrogate
key only
STORE
key S
Store_s urrogateN umber
store
storeC ity
state
country
sales Manager
sales Dis trict
SALES
key P (FK)
key D (FK)
key S (FK)
quantit y
receipt s
unitPrice
numberOf Customers
DATE
key D
date
month
quarter
y ear
day
week
holiday
PRODU CT
key P
produc t
ty pe
category
depart ment
marketingGroup
brand
brandC ity
Alternative logical representation
using convenience keys,
sometimes called DW surrogates
for all entities or a fact surrogate
key only
PRODU CT
key P
produc t_surogat eN umber
produc t
ty pe
category
depart ment
marketingGroup
brand
brandC ity
STORE
key S
Store_s urrogateN umber
st ore
st oreC ity
st at e
country
sales Manager
sales Dis trict
DATE
key D
date_s urogat eNumber
date
month
quarter
y ear
day
week
holiday
SALES
sales _s urogat eNumber
quantit y
receipt s
unitPrice
numberOf Customers
key D (FK)
key S (FK)
key P (FK)
key P (FK)
key D (FK)
key S (FK)
quantity
receipts
unitPrice
numberOf Customers
DATE
key D
date
month
quarter
y ear
day
week
holiday
STORE
key S
store
storeC ity
state
country
sales Manager
sales Dis trict
PRODU CT
key P
produc t
ty pe
category
department
marketingGroup
brand
brandC ity
Logical representation using
natural keys in DW
Alternative logical representation
using convenience keys,
sometimes called DW surrogates
for all entities or a fact surrogate
key only
STORE
key S
Store_s urrogateN umber
store
storeC ity
state
country
sales Manager
sales Dis trict
SALES
key P (FK)
key D (FK)
key S (FK)
quantit y
receipt s
unitPrice
numberOf Customers
DATE
key D
date
month
quarter
y ear
day
week
holiday
PRODU CT
key P
produc t
ty pe
category
depart ment
marketingGroup
brand
brandC ity
Alternative logical representation
using convenience keys,
sometimes called DW surrogates
for all entities or a fact surrogate
key only
PRODU CT
key P
produc t_surogat eN umber
produc t
ty pe
category
depart ment
marketingGroup
brand
brandC ity
STORE
key S
Store_s urrogateN umber
st ore
st oreC ity
st at e
country
sales Manager
sales Dis trict
DATE
key D
date_s urogat eNumber
date
month
quarter
y ear
day
week
holiday
SALES
sales _s urogat eNumber
quantit y
receipt s
unitPrice
numberOf Customers
key D (FK)
key S (FK)
key P (FK)
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Discuss difference between Star schema and a Snowflake schema
Dimensional Models in Erwin:
Star chema vs Snowflake schema (figs 8-3 and 8-5)
select * from sales ft, data dt1, store dt2, productdt3
where ft.keyP =dt3.keyP
and ft.keyS =DT2.keyS
and ft.keyD =dt1.keyD
will produce the cube for analytics,
three more joins would be needed for a snowflake
DATE
keyD
date
month
quarter
year
day
week
holiday
SALES
keyD (FK)
keyS (FK)
keyP (FK)
quantity
receipts
unitprice
nunberOfCustomers
STORE
keyS
store
storeCity
state
country
saleMngr
salesDisctrict
PRODUCT
keyP
product
type
category
department
marketingGroup
brand
brandCity
DATE
keyD
date
month
quarter
year
day
week
holiday
SALES
keyD (FK)
keyS (FK)
keyP (FK)
quantity
receipts
unitprice
nunberOfCustomers
PRODUCT
keyP
product
brand
brandCity
keyT (FK)
STORE
keyS
keyC (FK)
store
saleMngr
salesDisctrict
CITY
keyC
storeCity
country
TYPE
keyT
type
keyCa (FK)
marketingGroup
CATEGORY
keyCa
category
department
Dimensional Models in Erwin:
Star chema vs Snowflake schema (figs 8-3 and 8-5)
select * from sales ft, data dt1, store dt2, productdt3
where ft.keyP =dt3.keyP
and ft.keyS =DT2.keyS
and ft.keyD =dt1.keyD
will produce the cube for analytics,
three more joins would be needed for a snowflake
DATE
keyD
date
month
quarter
year
day
week
holiday
SALES
keyD (FK)
keyS (FK)
keyP (FK)
quantity
receipts
unitprice
nunberOfCustomers
STORE
keyS
store
storeCity
state
country
saleMngr
salesDisctrict
PRODUCT
keyP
product
type
category
department
marketingGroup
brand
brandCity
DATE
keyD
date
month
quarter
year
day
week
holiday
SALES
keyD (FK)
keyS (FK)
keyP (FK)
quantity
receipts
unitprice
nunberOfCustomers
PRODUCT
keyP
product
brand
brandCity
keyT (FK)
STORE
keyS
keyC (FK)
store
saleMngr
salesDisctrict
CITY
keyC
storeCity
country
TYPE
keyT
type
keyCa (FK)
marketingGroup
CATEGORY
keyCa
category
department
Fig 7.1 DV 2.0
DimCountry
countryKey
ISO3Code
name
continent
FactAirpotVisist
dateKey (FK)
airportKey (FK)
countryKey (FK)
roleKey (FK)
visitorCount
DurationOfStay
DimAirport
airportKey
name
code
DimRole
roleKey
name
DimDate
dateKey
year
quarter
month
dayOfMonth
weekday
Conformed dimensions (may be snow flaked, see Reinardi’s lecture on advanced DM)
DimCarrier
carrierKey
carrierCode
uniqueCarrierCode
name
FactConnection
carrierKey (FK)
flightKey (FK)
destinationAirport (FK)
OrigAirport (FK)
dateKey (FK)
CRSDepTime
depTime
depDelay
taxiOut
wheelsOff
taxiIn
arrTime
arvDelay
DimFlight
flightKey
FlightNum
DimAirport
airportKey
name
code
DimDate
dateKey
year
quarter
month
dayOfMonth
weekday
FactAirpotVisist
dateKey (FK)
airportKey (FK)
countryKey (FK)
roleKey (FK)
visitorCount
DurationOfStay
DimRole
roleKey
name
DimCountry
countryKey
ISO3Code
name
continent
DimCountry
countryKey
ISO3Code
name
continent
FactAirpotVisist
dateKey (FK)
airportKey (FK)
countryKey (FK)
roleKey (FK)
visitorCount
DurationOfStay
DimAirport
airportKey
name
code
DimRole
roleKey
name
DimDate
dateKey
year
quarter
month
dayOfMonth
weekday
Conformed dimensions (may be snow flaked, see Reinardi’s lecture on advanced DM)
DimCarrier
carrierKey
carrierCode
uniqueCarrierCode
name
FactConnection
carrierKey (FK)
flightKey (FK)
destinationAirport (FK)
OrigAirport (FK)
dateKey (FK)
CRSDepTime
depTime
depDelay
taxiOut
wheelsOff
taxiIn
arrTime
arvDelay
DimFlight
flightKey
FlightNum
DimAirport
airportKey
name
code
DimDate
dateKey
year
quarter
month
dayOfMonth
weekday
FactAirpotVisist
dateKey (FK)
airportKey (FK)
countryKey (FK)
roleKey (FK)
visitorCount
DurationOfStay
DimRole
roleKey
name
DimCountry
countryKey
ISO3Code
name
continent
Snowflake design
DimCarrier
carrierKey
carrierCode
uniqueCarrierCode
name
groupKey (FK)
FactConnection
carrierKey (FK)
flightKey (FK)
originAirport (FK)
destinationAirport (FK)
dateKey (FK)
CRSDepTime
depTime
depDelay
taxiOut
wheelsOff
taxiIn
arrTime
arvDelay
DimFlight
flightKey
FlightNum
DimAirport
airportKey
name
code
DimDate
dateKey
year
quarter
month
dayOfMonth
weekday
FactAirpotVisist
dateKey (FK)
airportKey (FK)
countryKey (FK)
roleKey (FK)
visitorCount
DurationOfStay
DimRole
roleKey
name
DimCountry
countryKey
ISO3Code
name
continent
DimGroup
groupKey
name
DimCarrier
carrierKey
carrierCode
uniqueCarrierCode
name
groupKey (FK)
FactConnection
carrierKey (FK)
flightKey (FK)
originAirport (FK)
destinationAirport (FK)
dateKey (FK)
CRSDepTime
depTime
depDelay
taxiOut
wheelsOff
taxiIn
arrTime
arvDelay
DimFlight
flightKey
FlightNum
DimAirport
airportKey
name
code
DimDate
dateKey
year
quarter
month
dayOfMonth
weekday
FactAirpotVisist
dateKey (FK)
airportKey (FK)
countryKey (FK)
roleKey (FK)
visitorCount
DurationOfStay
DimRole
roleKey
name
DimCountry
countryKey
ISO3Code
name
continent
DimGroup
groupKey
name
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Each fact table has a single focus (on
a particular Process ) from various
view points i.e. analytic dimensions
Not a part of the key
Part of the key
No Nulls Allow ed
Dimension_m
PKey _Dim_m
Location
Location_ID
Additional Terminology
Keys and mandatory data
Keys and Index es TBD
Time_Dimension
Datetime_ID_TBD
DiscussTimes
Dimension1_Subject
Dimension1_ID
Transaction Fact Table
Snapshot (periodic)
Accumulative snapshot
Fact_Focus1
Dimension1_ID (FK)
Location_ID (FK)
Datetime_ID_TBD (FK)
PKey _Dim_m (FK)
CamelsNamingConvention
SurogateKeyRarelyNeed
Example-draf t
Dimensional Model A applicable to
7-Eleven Data Warehouse ~2002
POS_MONTH
loc ationID (FK)
productID (FK)
periodID (FK)
POS_WEEK
loc ationID (FK)
productID (FK)
periodID (FK)
LOCATION
loc ationID
POS_DAY
loc ationID (FK)
productID (FK)
periodID (FK)
PRODUCT
productID
PERIOD
periodID
Note grain of the facts above (all aggregated from row OLTP transactions).
a particular Process ) from various
view points i.e. analytic dimensions
Not a part of the key
Part of the key
No Nulls Allow ed
Dimension_m
PKey _Dim_m
Location
Location_ID
Additional Terminology
Keys and mandatory data
Keys and Index es TBD
Time_Dimension
Datetime_ID_TBD
DiscussTimes
Dimension1_Subject
Dimension1_ID
Transaction Fact Table
Snapshot (periodic)
Accumulative snapshot
Fact_Focus1
Dimension1_ID (FK)
Location_ID (FK)
Datetime_ID_TBD (FK)
PKey _Dim_m (FK)
CamelsNamingConvention
SurogateKeyRarelyNeed
Example-draf t
Dimensional Model A applicable to
7-Eleven Data Warehouse ~2002
POS_MONTH
loc ationID (FK)
productID (FK)
periodID (FK)
POS_WEEK
loc ationID (FK)
productID (FK)
periodID (FK)
LOCATION
loc ationID
POS_DAY
loc ationID (FK)
productID (FK)
periodID (FK)
PRODUCT
productID
PERIOD
periodID
Note grain of the facts above (all aggregated from row OLTP transactions).
Data ty pes and s izes are needed f or
SQL D DL generation
No N ulls
Identif y ing ____
Non-ident if y ing - - -
Start from here w ith ERw in
participating dimensions are required
but not all represent the grain
Salesperson
SalespersonI D
SalespersonN ame
Terit ory Name
Region
Cust omer
Cust omerID
Cust omerN ame
Cust omerC ode
ShippingAddress
BillingAddress
OrderD at e
Date
Mont h
Quarter
Y ear
Product
Product ID
Product Name
Brand
OrderMeasure
Product ID (FK)
Date (FK)
Cust omerID (FK)
SalespersonI D (FK)
OrderD ollars
Cost
MarginD ollars
Quantity Sold
COMPONEN T
component _k ey
Component _part _number
component _name
unit_of _measure
category
COMPONEN T_U SE
component _k ey (FK)
time_k ey (FK)
produc t_line_k ey (FK)
produc t_key (FK)
us age_quantit y
MONTH
time_k ey
month_name
ACTIVI TY _FACT
ac tiv it y _k ey (FK)
produc t_line_k ey (FK)
time_k ey (FK)
Hours
TI ME
time_k ey
date
ACTIVI TY
ac tiv it y _k ey
ac tiv it y
LI NE
produc t_line_k ey
line_name
f ac ility
country
ty pe
PRODU CT
produc t_key
model_number
f amily
PRODU CTION_FAC T
produc t_line_k ey (FK)
produc t_key (FK)
time_k ey (FK)
units _produced_qty
Note ‘production line’ arrangement of facts
SQL D DL generation
No N ulls
Identif y ing ____
Non-ident if y ing - - -
Start from here w ith ERw in
participating dimensions are required
but not all represent the grain
Salesperson
SalespersonI D
SalespersonN ame
Terit ory Name
Region
Cust omer
Cust omerID
Cust omerN ame
Cust omerC ode
ShippingAddress
BillingAddress
OrderD at e
Date
Mont h
Quarter
Y ear
Product
Product ID
Product Name
Brand
OrderMeasure
Product ID (FK)
Date (FK)
Cust omerID (FK)
SalespersonI D (FK)
OrderD ollars
Cost
MarginD ollars
Quantity Sold
COMPONEN T
component _k ey
Component _part _number
component _name
unit_of _measure
category
COMPONEN T_U SE
component _k ey (FK)
time_k ey (FK)
produc t_line_k ey (FK)
produc t_key (FK)
us age_quantit y
MONTH
time_k ey
month_name
ACTIVI TY _FACT
ac tiv it y _k ey (FK)
produc t_line_k ey (FK)
time_k ey (FK)
Hours
TI ME
time_k ey
date
ACTIVI TY
ac tiv it y _k ey
ac tiv it y
LI NE
produc t_line_k ey
line_name
f ac ility
country
ty pe
PRODU CT
produc t_key
model_number
f amily
PRODU CTION_FAC T
produc t_line_k ey (FK)
produc t_key (FK)
time_k ey (FK)
units _produced_qty
Note ‘production line’ arrangement of facts
BULK_PR OD UCTION_FACT
pr_k ey (FK)
produc t_key (FK)
f ac ilty _key (FK)
time_k ey (FK)
produc ed_qty
PRODU CT
produc t_key
produc t_name
receipe
produc tion_unit_of _meas ure
produc tion_manager
FACILI TY
f ac ilty _key
f ac ilty _name
f ac ility -loc ation
stat e
manager
IN GR EDI ENT
ingredient _k ey
ingredient _name
unit_of _measure
ingredient
suplier
f orm
TIME
time_k ey
date
month_name
quarter
y ear
REC IPE_FACT
ingredient _k ey (FK)
produc t_key (FK)
standard-quantit y
IN GR EDI ENT_USAGE
produc t_key (FK)
f ac ilty _key (FK)
pr_k ey (FK)
time_k ey (FK)
ingredient _k ey (FK)
quantit y
total_c os t
PRODU CTION_R UN
pr_k ey
capacit y _uniut_of _meas ure
line
pl_monthly _c apacity
pr_k ey (FK)
produc t_key (FK)
f ac ilty _key (FK)
time_k ey (FK)
produc ed_qty
PRODU CT
produc t_key
produc t_name
receipe
produc tion_unit_of _meas ure
produc tion_manager
FACILI TY
f ac ilty _key
f ac ilty _name
f ac ility -loc ation
stat e
manager
IN GR EDI ENT
ingredient _k ey
ingredient _name
unit_of _measure
ingredient
suplier
f orm
TIME
time_k ey
date
month_name
quarter
y ear
REC IPE_FACT
ingredient _k ey (FK)
produc t_key (FK)
standard-quantit y
IN GR EDI ENT_USAGE
produc t_key (FK)
f ac ilty _key (FK)
pr_k ey (FK)
time_k ey (FK)
ingredient _k ey (FK)
quantit y
total_c os t
PRODU CTION_R UN
pr_k ey
capacit y _uniut_of _meas ure
line
pl_monthly _c apacity
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
You may want to expand this model
later with more attri butes. ..
Example/Exercise
Use ERwin to model DW
and to create SQL DDL
(Fig 2.3 in Kimball DWT)
Measured Facts
in the Retail Schema
Promot ion
Promot ion_Key
Promot ionAtt ributes
Store
Store_Key
StoreAtt ribut es
Product
Product _Key
Product Att ributes
POS_Transact ion_Fac t
POS_Transact ionN umber
Product _Key (FK)
Store_Key (FK)
Date_Key (FK)
Promot ion_Key (FK)
SalesQuant ity
SalesD ollarAmount
CostDollarAmount
GrossProf it DollarAmount
DateDimens ion
Date_Key
DateAtt ribut es
CUSTOMER
MA RKET
PRODUCT
PERIOD SALES
The Star Schema, i.e.
Dimensional Model answers
an essential question:
What are the 'angles' for
analyzing Sales data?
Dimensions are perspectives
Sales facts (the most frequent data)
are basically measurements.
Step 1: Categories in DM
SALESPERSON
later with more attri butes. ..
Example/Exercise
Use ERwin to model DW
and to create SQL DDL
(Fig 2.3 in Kimball DWT)
Measured Facts
in the Retail Schema
Promot ion
Promot ion_Key
Promot ionAtt ributes
Store
Store_Key
StoreAtt ribut es
Product
Product _Key
Product Att ributes
POS_Transact ion_Fac t
POS_Transact ionN umber
Product _Key (FK)
Store_Key (FK)
Date_Key (FK)
Promot ion_Key (FK)
SalesQuant ity
SalesD ollarAmount
CostDollarAmount
GrossProf it DollarAmount
DateDimens ion
Date_Key
DateAtt ribut es
CUSTOMER
MA RKET
PRODUCT
PERIOD SALES
The Star Schema, i.e.
Dimensional Model answers
an essential question:
What are the 'angles' for
analyzing Sales data?
Dimensions are perspectives
Sales facts (the most frequent data)
are basically measurements.
Step 1: Categories in DM
SALESPERSON
Ty pe II
- adds dimension row
and a 'times tamp'
rel evant metamodel segment:
Mix ing of (Ty pe I , Ty pe II , etc .)
policies require some f orm of met adat a
(a k ey prot ion of a met amodel is shown
bellow).
Ty pe II - Today or y est erday sc enario
- ty pically f or slow pace of change,
ty pe II s olution approac h prov ides key
capability t o preserv e pas t rec ords by
creating new dimens ional records ev ery
time ane or more att ribut e v alues
(s eleced f or t y pe II ) changes;
this solut ion (see model on the right)
requires surrogate k ey s f or dimensional
records and explicit 't ime of change'
ref erences. THis solut ion is mos t
f requently us ed in prof ess ional D W,
note that not all of t he att ributes are
shown!
TY PEATTRI BUTEDIMEN SION
It is pos sible to addopt t his st rategy
only f or a subset of dimens ional
attbut es and use the st andadized D W
ref resh c y cle not only t o add f act s
data but t o updat e dimens ions (this
works f or Ty pe I and f or Ty pe I I);
Ty pe I- Today f or y es terday s cenario,
while the ov erwrite is s implest
approach pos sible it is not recommended
as it t reat s dimensional rec ords as if
in a database (only the las t v alue s et
will be pres erv ed) - but adv antage is
that no change to a minimal logic al
st ruct ure of DM is needed. In prac tic al
terms t his s trategy is f easible: a) f or
correc tins, and b) only in case of
at tibutes of v ery lit tle v alue.
Dimensional Change
the key isue for
Logical Modeling of DW
FACT
subject _s urrogateN umber (FK)
Datetime (FK)
Plac e_surrogateNumber (FK)
SUBJ ECT
subject _s urrogateN umber
DatetimeOf C hange (FK)
TIME
Datetime
PLAC E
Plac e_surrogateNumber
DatetimeOf C hange (FK)
- adds dimension row
and a 'times tamp'
rel evant metamodel segment:
Mix ing of (Ty pe I , Ty pe II , etc .)
policies require some f orm of met adat a
(a k ey prot ion of a met amodel is shown
bellow).
Ty pe II - Today or y est erday sc enario
- ty pically f or slow pace of change,
ty pe II s olution approac h prov ides key
capability t o preserv e pas t rec ords by
creating new dimens ional records ev ery
time ane or more att ribut e v alues
(s eleced f or t y pe II ) changes;
this solut ion (see model on the right)
requires surrogate k ey s f or dimensional
records and explicit 't ime of change'
ref erences. THis solut ion is mos t
f requently us ed in prof ess ional D W,
note that not all of t he att ributes are
shown!
TY PEATTRI BUTEDIMEN SION
It is pos sible to addopt t his st rategy
only f or a subset of dimens ional
attbut es and use the st andadized D W
ref resh c y cle not only t o add f act s
data but t o updat e dimens ions (this
works f or Ty pe I and f or Ty pe I I);
Ty pe I- Today f or y es terday s cenario,
while the ov erwrite is s implest
approach pos sible it is not recommended
as it t reat s dimensional rec ords as if
in a database (only the las t v alue s et
will be pres erv ed) - but adv antage is
that no change to a minimal logic al
st ruct ure of DM is needed. In prac tic al
terms t his s trategy is f easible: a) f or
correc tins, and b) only in case of
at tibutes of v ery lit tle v alue.
Dimensional Change
the key isue for
Logical Modeling of DW
FACT
subject _s urrogateN umber (FK)
Datetime (FK)
Plac e_surrogateNumber (FK)
SUBJ ECT
subject _s urrogateN umber
DatetimeOf C hange (FK)
TIME
Datetime
PLAC E
Plac e_surrogateNumber
DatetimeOf C hange (FK)
FACT
subject _s urrogat eN umber (FK)
Datetime (FK)
Plac e_surrogateNumber (FK)
SUBJ ECT
subject _s urrogat eN umber
DatetimeOf C hange (FK)
DatetimeValidTo (FK)
subject NaturalKey _Mas ter
TI ME
Datetime
PLAC E
Plac e_surrogateNumber
DatetimeOf C hange (FK)
DatetimeValidTo (FK)
placeNaturalKey _Mas ter
Ty pe II I- This is the case when there
is a legitimate buisnes s need f or
alt ernate reality (us ually what if
analy s is) where bot h sc enarios (t oday
f or y est erday and y es terday f or today )
are ex pect ed. This is used rat her
inf requent ly and only f or two st at es ,
if t he whole history part itioning is
needed than ty pe I I approach is to be
us ed.
Model has a c lear weak ness as only one
time att ribut e is prov ided with chnaged
v alue and only one time ref erence is
prov ided (obv ious ly a c ombined s olution
with s et of redundant at tribute pairs
(Prev oius Value c olumn and
DatetimeOf C hange (pairs ) is t ec hnically
poss ible but is of v ery limit ed v alue
despite it s obv ous c omplex ity (and
irregularit y ) and is not used in
prac tic e.
Dis cuss impact on queries t reating
ty pes of c hanges as Patt erns!
ANY _D IMENSION
surrogat eKey Number
Prev iousAt tribut eTy peII
CurrentAt tributeTy peII
Modif ic at ionDat et ime
Type IV
-Ful ly Logged Changes
Ty pe II I
- adds a dimens ional column
in a t y pe I I like solut ions, but no
new dimensional records
Pract ical c rit erion f or rec ognizing
'slowly ' c hanging dimens ional
at tributes is bimontly or less
f requent. For the mont ly or more
f requent changes , es pecially in case of
'large dimensions (million records or
more), Kimbal recomends t o break
f reqent ly c hanging at tributes into
minidimensions.
Tis apprpoach supports ev ery temporal
sc enario,
Note regarding potential for research: a) 6NF, b) Anchor Modeling (goggle), c) Column-based, and d)
Dimensions in context of dimensions (change in time, for subject, per locality – scope, per convention).
subject _s urrogat eN umber (FK)
Datetime (FK)
Plac e_surrogateNumber (FK)
SUBJ ECT
subject _s urrogat eN umber
DatetimeOf C hange (FK)
DatetimeValidTo (FK)
subject NaturalKey _Mas ter
TI ME
Datetime
PLAC E
Plac e_surrogateNumber
DatetimeOf C hange (FK)
DatetimeValidTo (FK)
placeNaturalKey _Mas ter
Ty pe II I- This is the case when there
is a legitimate buisnes s need f or
alt ernate reality (us ually what if
analy s is) where bot h sc enarios (t oday
f or y est erday and y es terday f or today )
are ex pect ed. This is used rat her
inf requent ly and only f or two st at es ,
if t he whole history part itioning is
needed than ty pe I I approach is to be
us ed.
Model has a c lear weak ness as only one
time att ribut e is prov ided with chnaged
v alue and only one time ref erence is
prov ided (obv ious ly a c ombined s olution
with s et of redundant at tribute pairs
(Prev oius Value c olumn and
DatetimeOf C hange (pairs ) is t ec hnically
poss ible but is of v ery limit ed v alue
despite it s obv ous c omplex ity (and
irregularit y ) and is not used in
prac tic e.
Dis cuss impact on queries t reating
ty pes of c hanges as Patt erns!
ANY _D IMENSION
surrogat eKey Number
Prev iousAt tribut eTy peII
CurrentAt tributeTy peII
Modif ic at ionDat et ime
Type IV
-Ful ly Logged Changes
Ty pe II I
- adds a dimens ional column
in a t y pe I I like solut ions, but no
new dimensional records
Pract ical c rit erion f or rec ognizing
'slowly ' c hanging dimens ional
at tributes is bimontly or less
f requent. For the mont ly or more
f requent changes , es pecially in case of
'large dimensions (million records or
more), Kimbal recomends t o break
f reqent ly c hanging at tributes into
minidimensions.
Tis apprpoach supports ev ery temporal
sc enario,
Note regarding potential for research: a) 6NF, b) Anchor Modeling (goggle), c) Column-based, and d)
Dimensions in context of dimensions (change in time, for subject, per locality – scope, per convention).
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
ETC
FACT
cust omerD emographics Key (FK)
cust omerN umber (FK)
Fac tN umbers
CU STOMER _DEMOGRAPH ICS
cust omerD emographics Key
demongraphic sAtt ribut es
CU STOMER
cust omerN umber
cust omerAt tribut es
Minidimens ions, with a pract ical upper
limit 10**5 records
f requent changes , es pecially in case of
'large dimensions (million records or
more), Kimbal recomends t o break
f reqent ly c hanging at tributes into
minidimensions.
FACT
cust omerD emographics Key (FK)
cust omerN umber (FK)
Fac tN umbers
CU STOMER _DEMOGRAPH ICS
cust omerD emographics Key
demongraphic sAtt ribut es
CU STOMER
cust omerN umber
cust omerAt tribut es
Minidimens ions, with a pract ical upper
limit 10**5 records
f requent changes , es pecially in case of
'large dimensions (million records or
more), Kimbal recomends t o break
f reqent ly c hanging at tributes into
minidimensions.
The second main point in M4 is representing views of interest (suggested) for materialization:
V1_product_date_store
V3_category_month_storeCity V4_type__month__state
A sample of sales schema view s that can
be materialized
Thius ERw in style sheet is to be us ed
f or planning MVs, the format is:
V#_termDi_termDj_termDk etc.
V# generally indic ates order of
aggregation f rom a primary dimension
lev els up and relationship depicts
aggregation path f rom a dot up
show ing dependency direction.
Partial order f ollow s dependency,
here: V1, V2, (V3,V4), V 5
V2_type_date_storeCity
V5_quarter_state
V1_product_date_store
V3_category_month_storeCity V4_type__month__state
A sample of sales schema view s that can
be materialized
Thius ERw in style sheet is to be us ed
f or planning MVs, the format is:
V#_termDi_termDj_termDk etc.
V# generally indic ates order of
aggregation f rom a primary dimension
lev els up and relationship depicts
aggregation path f rom a dot up
show ing dependency direction.
Partial order f ollow s dependency,
here: V1, V2, (V3,V4), V 5
V2_type_date_storeCity
V5_quarter_state
Logical Design (for Data Marts, as DV was treated separately as EDW Logical/physical model)
Logical design integrates activities and technologies useful in producing a logical model for direct
relational implementation. The main steps are:
- translating fact (DFM) schema into logical schema (star, snowflake, and constellation),
- representing application oriented, i.e. presentation views (this is covered in more detail only in
accompanying cases and within the Module 6 in addressing reporting and BI consolidation dealing with a
presentation tier of a DW and mapping of dimensions and facts to analytical front end including tools
such as simple Excel/SAS JMP, or more elaborate and expensive third party BI/dashboard tools),
- representing aggregates (materializing views),
- specifying logical solutions for eventual archiving (delete), replications /backups for disaster recovery,
and possibly considering deduplication technology (in maintaining data quality, or implementing MDM
solution)
- fragmenting fact tables vertically (typically isolating measures possibly at different grain), horizontally
(especially time partitioning, isolating records or by more esoteric criteria- not frequent), and creating
cross-fact views.
Technical details are excellently covered by the ch9, particularly view materialization and fragmentation,
and will not be repeated here except for a few clarifying (expanding) examples of logical design, starting
with resolution of cross-dimensional attributes (i.e. secondary facts):
...
...
Fact
Conformed Dimensi ons- Bus
TIME
Datetime
year
quarter
month
week
day
holiday
etc
Cross Dimensinal attribute VAT
is resolv ed as a secondary fact
VA T
Datetime (FK)
country (FK)
category (FK)
VA T
PRODUCT
produc tKey
produc t
type
category
department
marketingGroup
brand
ST ORE
storeKey
store
storeCity
country
salesManager
SA LES
timeKey (FK)
produc tKey (FK)
storeKey (FK)
quantity
unitP rice
rec eipts
numberOfCus tomers
Logical design integrates activities and technologies useful in producing a logical model for direct
relational implementation. The main steps are:
- translating fact (DFM) schema into logical schema (star, snowflake, and constellation),
- representing application oriented, i.e. presentation views (this is covered in more detail only in
accompanying cases and within the Module 6 in addressing reporting and BI consolidation dealing with a
presentation tier of a DW and mapping of dimensions and facts to analytical front end including tools
such as simple Excel/SAS JMP, or more elaborate and expensive third party BI/dashboard tools),
- representing aggregates (materializing views),
- specifying logical solutions for eventual archiving (delete), replications /backups for disaster recovery,
and possibly considering deduplication technology (in maintaining data quality, or implementing MDM
solution)
- fragmenting fact tables vertically (typically isolating measures possibly at different grain), horizontally
(especially time partitioning, isolating records or by more esoteric criteria- not frequent), and creating
cross-fact views.
Technical details are excellently covered by the ch9, particularly view materialization and fragmentation,
and will not be repeated here except for a few clarifying (expanding) examples of logical design, starting
with resolution of cross-dimensional attributes (i.e. secondary facts):
...
...
Fact
Conformed Dimensi ons- Bus
TIME
Datetime
year
quarter
month
week
day
holiday
etc
Cross Dimensinal attribute VAT
is resolv ed as a secondary fact
VA T
Datetime (FK)
country (FK)
category (FK)
VA T
PRODUCT
produc tKey
produc t
type
category
department
marketingGroup
brand
ST ORE
storeKey
store
storeCity
country
salesManager
SA LES
timeKey (FK)
produc tKey (FK)
storeKey (FK)
quantity
unitP rice
rec eipts
numberOfCus tomers
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
modeling multiple arcs as a dimensional
bridge (nominaly a secodary fact)
...
Fact
BRIDGE-AUTHOR
keyA (FK)
keyB (FK)
AUTHOR
keyA
BOOK
keyB
SALES
keyB (FK)
keyTime
Remember that textbook uses cross dimensional attribute for what DrJ claims a better name “secondary
fact” (not the secondary events, those are different level of granularity i.e. grades for a fact event)
The above rendering of star schemas shows use of natural keys, an option nowadays consider obsolete in
Data Marts (not so in DV implementations, as while DV 1.0 used sequenced surrogates the DV 2.0
standard practice uses hashed business keys). We recommend use of surrogate keys at least for all fact
tables in DM.
bridge (nominaly a secodary fact)
...
Fact
BRIDGE-AUTHOR
keyA (FK)
keyB (FK)
AUTHOR
keyA
BOOK
keyB
SALES
keyB (FK)
keyTime
Remember that textbook uses cross dimensional attribute for what DrJ claims a better name “secondary
fact” (not the secondary events, those are different level of granularity i.e. grades for a fact event)
The above rendering of star schemas shows use of natural keys, an option nowadays consider obsolete in
Data Marts (not so in DV implementations, as while DV 1.0 used sequenced surrogates the DV 2.0
standard practice uses hashed business keys). We recommend use of surrogate keys at least for all fact
tables in DM.
G9
G8_bG7_d
G6_aG5_b_dG4_c
G3_a_dG2_b_c
G1_a_c
Multidimensional latic e for the above
bacd FACT
G8_bG7_d
G6_aG5_b_dG4_c
G3_a_dG2_b_c
G1_a_c
Multidimensional latic e for the above
bacd FACT
M4- Self-Assessment Exercises for Practice
Exercise 1.
Design a logical schema (Dimensional Model) for PURCHASE fact schema solution for preliminary
exercise in M3-Assignment.
Try your own before looking into a suggested solution with discussion below.
Exercise 1.
Design a logical schema (Dimensional Model) for PURCHASE fact schema solution for preliminary
exercise in M3-Assignment.
Try your own before looking into a suggested solution with discussion below.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Exercise 1 recommended solution with discussion:
General note, during the logical design for Data Marts (i.e. star schemas and their translation into
relations) fact tables may be represented with composite PK (out of dimension tables natural PK)
most designers/developers use surrogagte PK explicitely (for Dimensnions that is a standard
prasctice, see textbook pp 223 for more info, and some even for fact tables, see Kimball’s Design
tip # 81) – this practice has maintenance advantages while storage options needs to be researched
(experiment vith alternatives for your project).
Note here 'imperfect' ref to space
dimension: Countr y
HOW this table is not nec es s arily needed in
a D ata Mart as job, income range and
ev en quality c an be migrated to Fact as
degenerate dimensions (like Curr enc y
and AgeRange) to minimize joins ; this
is a particularly freqent prac tic e with
aggr egated fact tables (as the v olume
inc reas e may not be prohibitiv e).
WHER E
HOW
WHEN
FT_PU RC HASE
Purc has eSurogateNumber
CardTy peID ( FK)
Date ( FK)
J ob (FK)
Inc omeR ange (FK)
StoreSurrogateID (FK)
Curr enc y
AgeR ange
Number
Amount
Ex changeRate
Cr editLimit
DT_STOR E
StoreSurrogateID
Store
StoreTy pe
Area
Country
DT_D ATE
Date
Day OfWeek
Holliday
Week
Month
Year
DT_C AR D_TYPE
CardTy peID
Network
Iss uingBank
Country
Color
BT_QUALITY
J ob
Inc omeR ange
quality
The solution Data Mart above is based on the DFM analysis for practice from M3. Please take a
notice that in all data mart designs a time dimension is paramount in order to enable cumulative
reporting along the time line. Note that IncomeRange and Job may be treated as degenerate
dimensions and like Currency and AgeRange incorporated into the fact table FT_PURCHASE to
save on joins. In the suggested solution for M3A1-E1 the holder’s personal data were grafted and
demografic data aggregated. A plausible solution (if privacy issue is legaly resolved and
customer dimension is explicitly recognized) could be made at granularity level of individual
purchase (by WHO the Customer , that is the holder, and the HOW by Credit-card (CC where
card type will be incorporated). The specific limitation of this exercise in PURCHASE with CC
is that (WHAT) PRODUCTis not captured.
General note, during the logical design for Data Marts (i.e. star schemas and their translation into
relations) fact tables may be represented with composite PK (out of dimension tables natural PK)
most designers/developers use surrogagte PK explicitely (for Dimensnions that is a standard
prasctice, see textbook pp 223 for more info, and some even for fact tables, see Kimball’s Design
tip # 81) – this practice has maintenance advantages while storage options needs to be researched
(experiment vith alternatives for your project).
Note here 'imperfect' ref to space
dimension: Countr y
HOW this table is not nec es s arily needed in
a D ata Mart as job, income range and
ev en quality c an be migrated to Fact as
degenerate dimensions (like Curr enc y
and AgeRange) to minimize joins ; this
is a particularly freqent prac tic e with
aggr egated fact tables (as the v olume
inc reas e may not be prohibitiv e).
WHER E
HOW
WHEN
FT_PU RC HASE
Purc has eSurogateNumber
CardTy peID ( FK)
Date ( FK)
J ob (FK)
Inc omeR ange (FK)
StoreSurrogateID (FK)
Curr enc y
AgeR ange
Number
Amount
Ex changeRate
Cr editLimit
DT_STOR E
StoreSurrogateID
Store
StoreTy pe
Area
Country
DT_D ATE
Date
Day OfWeek
Holliday
Week
Month
Year
DT_C AR D_TYPE
CardTy peID
Network
Iss uingBank
Country
Color
BT_QUALITY
J ob
Inc omeR ange
quality
The solution Data Mart above is based on the DFM analysis for practice from M3. Please take a
notice that in all data mart designs a time dimension is paramount in order to enable cumulative
reporting along the time line. Note that IncomeRange and Job may be treated as degenerate
dimensions and like Currency and AgeRange incorporated into the fact table FT_PURCHASE to
save on joins. In the suggested solution for M3A1-E1 the holder’s personal data were grafted and
demografic data aggregated. A plausible solution (if privacy issue is legaly resolved and
customer dimension is explicitly recognized) could be made at granularity level of individual
purchase (by WHO the Customer , that is the holder, and the HOW by Credit-card (CC where
card type will be incorporated). The specific limitation of this exercise in PURCHASE with CC
is that (WHAT) PRODUCTis not captured.
Exercise 2.
In the fact schema below, each attribute takes 100 bytes and the fact measures take 300 bytes as
a whole. The fact cardinality is 100.000, and the other cardinalities are as follows: card(a) =
1000, card(b) = 500, card(c) = 20, card(d) = 50, and card(e) = 10. Find the plausible logical
solutions and estimate their storage space.
Exercise 2-Solution 1 (straightforward):
Space required (ignoring fact table, same in all options)
= (1k x (4+100+100)Bytes) + (50 x (4+100)Bytes) + (25k x (4+100+100+100)Bytes)
= 204kB + 5.2kB + 7600kB
≈ 7.8MB
Exercise 2- Solution 2 (surrogate key for b attribute):
Space required (ignoring fact table, as in all options)
= (1k x (4+100+100+4)Bytes) + (50 x (4+100)Bytes) + (25k x (4+4+100+100)Bytes)
= 208kB + 5.2kB + 5200kB ≈ 5.41MB
Exercise 2- Solution 3 (snowflaking on b, extension of solution 2):
In the fact schema below, each attribute takes 100 bytes and the fact measures take 300 bytes as
a whole. The fact cardinality is 100.000, and the other cardinalities are as follows: card(a) =
1000, card(b) = 500, card(c) = 20, card(d) = 50, and card(e) = 10. Find the plausible logical
solutions and estimate their storage space.
Exercise 2-Solution 1 (straightforward):
Space required (ignoring fact table, same in all options)
= (1k x (4+100+100)Bytes) + (50 x (4+100)Bytes) + (25k x (4+100+100+100)Bytes)
= 204kB + 5.2kB + 7600kB
≈ 7.8MB
Exercise 2- Solution 2 (surrogate key for b attribute):
Space required (ignoring fact table, as in all options)
= (1k x (4+100+100+4)Bytes) + (50 x (4+100)Bytes) + (25k x (4+4+100+100)Bytes)
= 208kB + 5.2kB + 5200kB ≈ 5.41MB
Exercise 2- Solution 3 (snowflaking on b, extension of solution 2):
NOTE: This solution offers a very small space savings compared to the previous one, this is because of
the small difference in cardinality between B and A (500 vs 1000). It is questionable to require extra
joins for such a small space savings.
Space required (ignoring fact table as in other options)
= (1k x (4+100+4)B) + (50 x (4+100)B) + (25k x (4+4+100+100)B) + (500 x (4+100)B)
= 108kB + 5.2kB + 5200kB + 52k
≈ 5.37MB
Exercise 2- Solution 4 (snowflaking where it actually matters):
NOTE: The previous solution only snowflakes B but ignores E. This seems an odd choice; since E has
the lowest cardinality of all attributes (10) and its parent C has the highest cardinality (25,000), it is
actually by far the best candidate for snowflaking! Here, we are trading off one extra join for nearly 50%
space savings, unlike above when we traded a join for next to nothing.
Space required (ignoring fact table, as in other options)
= (1k x (4+100+4)B) + (50 x (4+100)B) + (25k x (4+4+100+4)B) + (500 x (4+100)B)
+ (10 x (4+100)B)
= 108kB + 5.2kB + 2800kB + 52k + 1.04kB
≈ 2.9MB <--- almost 50% better than best offered solution!
(we may also “undo-snowflake” B, for a combination between solutions 2 and 4, this would be a tiny
fraction of space larger than solution 4, but would remove the join between A and B)
the small difference in cardinality between B and A (500 vs 1000). It is questionable to require extra
joins for such a small space savings.
Space required (ignoring fact table as in other options)
= (1k x (4+100+4)B) + (50 x (4+100)B) + (25k x (4+4+100+100)B) + (500 x (4+100)B)
= 108kB + 5.2kB + 5200kB + 52k
≈ 5.37MB
Exercise 2- Solution 4 (snowflaking where it actually matters):
NOTE: The previous solution only snowflakes B but ignores E. This seems an odd choice; since E has
the lowest cardinality of all attributes (10) and its parent C has the highest cardinality (25,000), it is
actually by far the best candidate for snowflaking! Here, we are trading off one extra join for nearly 50%
space savings, unlike above when we traded a join for next to nothing.
Space required (ignoring fact table, as in other options)
= (1k x (4+100+4)B) + (50 x (4+100)B) + (25k x (4+4+100+4)B) + (500 x (4+100)B)
+ (10 x (4+100)B)
= 108kB + 5.2kB + 2800kB + 52k + 1.04kB
≈ 2.9MB <--- almost 50% better than best offered solution!
(we may also “undo-snowflake” B, for a combination between solutions 2 and 4, this would be a tiny
fraction of space larger than solution 4, but would remove the join between A and B)
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Exercise 3. In the fact schema below, attribute a takes 100 bytes and the fact measures take 300
as a whole. The fact cardinality is 1.000.000, and it takes 1000 distinct values. Each primary
event is associated with 3 values of a in the average. Find the possible logical design solutions
and estimate their storage space.
Exercise 3 solution alternatives:
In the 1st alternative attribute ‘A’ is directly included in both the fact and the bridge table (note above fact
schema implies different values of a):
FACT[x 1,000,000]
Fact_Surrogate[4 Bytes]
Attribute_A [100 Bytes]
Measures[300 Bytes]
BRIDGE[3 x 1,000,000]
Fact_Surrogate[4 Bytes] (FK)
Attribute_A[100 Bytes]
Weight[4 Bytes]
Space calculation:
BRIDGE(A, idGroup, weight) (3 x 1000000) tuples x (100+4+4) bytes= 324000000 bytes
FACT (A, idGroup,…)1000000 tuples x (100+4+300) bytes= 404000000 bytes
Total Space = 728,000,000 bytes
In the 2nd solution alternative surrogate is included for the attribute to reduce storage space (a standard
solution generally with a number maintenance related advantages)
FACT[x 1,000,000]
Fact_Surrogate[4 Bytes]
Measures[300 Bytes]
A_Surrogate[4 Bytes] (FK)
BRIDGE[3 x 1,000,000]
Fact_Surrogate[4 Bytes] (FK)
A_Surrogate[4 Bytes] (FK)
Weight[4 Bytes]
DIMENSION_A[x 1,000]
A_Surrogate[4 Bytes]
Attribute_A[100 Bytes]
DT_A(idA, a) 1000 tuples x (4+100) bytes = 104000 bytes
BRIDGE(idA:DT_A, idGroup, weight) (3 x 1000000) tuples x (4+4+4) bytes = 36000000 bytes
FT(idA:DT_A, idGroup,…) 1000000 tuples x (4+4+300) bytes= 308000000 bytes
Total Space = 344,104,000 bytes
as a whole. The fact cardinality is 1.000.000, and it takes 1000 distinct values. Each primary
event is associated with 3 values of a in the average. Find the possible logical design solutions
and estimate their storage space.
Exercise 3 solution alternatives:
In the 1st alternative attribute ‘A’ is directly included in both the fact and the bridge table (note above fact
schema implies different values of a):
FACT[x 1,000,000]
Fact_Surrogate[4 Bytes]
Attribute_A [100 Bytes]
Measures[300 Bytes]
BRIDGE[3 x 1,000,000]
Fact_Surrogate[4 Bytes] (FK)
Attribute_A[100 Bytes]
Weight[4 Bytes]
Space calculation:
BRIDGE(A, idGroup, weight) (3 x 1000000) tuples x (100+4+4) bytes= 324000000 bytes
FACT (A, idGroup,…)1000000 tuples x (100+4+300) bytes= 404000000 bytes
Total Space = 728,000,000 bytes
In the 2nd solution alternative surrogate is included for the attribute to reduce storage space (a standard
solution generally with a number maintenance related advantages)
FACT[x 1,000,000]
Fact_Surrogate[4 Bytes]
Measures[300 Bytes]
A_Surrogate[4 Bytes] (FK)
BRIDGE[3 x 1,000,000]
Fact_Surrogate[4 Bytes] (FK)
A_Surrogate[4 Bytes] (FK)
Weight[4 Bytes]
DIMENSION_A[x 1,000]
A_Surrogate[4 Bytes]
Attribute_A[100 Bytes]
DT_A(idA, a) 1000 tuples x (4+100) bytes = 104000 bytes
BRIDGE(idA:DT_A, idGroup, weight) (3 x 1000000) tuples x (4+4+4) bytes = 36000000 bytes
FT(idA:DT_A, idGroup,…) 1000000 tuples x (4+4+300) bytes= 308000000 bytes
Total Space = 344,104,000 bytes
In addition to readings and examples from M4 Dimensional Modeling - Cases, set of cases is particularly
relevant here as you develop your term project dimensional models. Cases covered in the readings for this
course include:
Case 1- Wall Mart comprehensive example of DM (current in 2000),
Case 2- Retail Introduction, a simple model elaboration for reviewing basic DM concepts,
Case 3- Inventory, with essentials of a value chain, snapshots, and conformed dimensions,
Case 4- Procurement, providing basic approach to tracking slowly changing dimensions,
Case 5- Order Management, reviewing more details of DM
Case 6- CRM, introducing mini-dimensions, hierarchies, and presentation views,
Case 7- Accounting, presenting GL, budgeting, and fiscal calendars (see related Calendar case on CD),
Case 8- HR, introducing keywords as outrigger, and the audit dimension with exposed metadata,
Case 9- Banking with bridge, value banding, point in time balances, and heterogeneous product schemas,
Case10-Telecom with realistic design review example, geographic location dimension elaboration,
Case11-Transportation with fact tables at different granularity levels, time vs. date and time zones,
Case12-University with accumulating snapshots, fact-less fact tables, and handling of non existing events,
Case13-Health Care with additional details re roles, multi-valued dimensions, and complex events,
Case14-Electronic Commerce, with click stream analysis and web systems profitability,
Case15-Insurance: requirements traceability, and mini-dimensions for large, rapidly changing
dimensions.
Case16- Production
Case17- US Census
Recommended related readings (essential for DW professionals):
R. Kimball, M. Ross”The Data Warehouse Toolkit”3ed, Wiley 2013,
C. Adamson, M. Venerable “Data Warehouse Design Solutions” Wiley 1998;
C. Adamson “Mastering Data Warehouse Aggregates” Wiley 2006,
L. Solverston “The Data Model Resource Book” Vol 2, revised edition, Wiley 2001;
C. Imhoff, N. Galemmo, J. Geiger “Mastering Data Warehouse Design” Wiley 2003, (Calendar case).
relevant here as you develop your term project dimensional models. Cases covered in the readings for this
course include:
Case 1- Wall Mart comprehensive example of DM (current in 2000),
Case 2- Retail Introduction, a simple model elaboration for reviewing basic DM concepts,
Case 3- Inventory, with essentials of a value chain, snapshots, and conformed dimensions,
Case 4- Procurement, providing basic approach to tracking slowly changing dimensions,
Case 5- Order Management, reviewing more details of DM
Case 6- CRM, introducing mini-dimensions, hierarchies, and presentation views,
Case 7- Accounting, presenting GL, budgeting, and fiscal calendars (see related Calendar case on CD),
Case 8- HR, introducing keywords as outrigger, and the audit dimension with exposed metadata,
Case 9- Banking with bridge, value banding, point in time balances, and heterogeneous product schemas,
Case10-Telecom with realistic design review example, geographic location dimension elaboration,
Case11-Transportation with fact tables at different granularity levels, time vs. date and time zones,
Case12-University with accumulating snapshots, fact-less fact tables, and handling of non existing events,
Case13-Health Care with additional details re roles, multi-valued dimensions, and complex events,
Case14-Electronic Commerce, with click stream analysis and web systems profitability,
Case15-Insurance: requirements traceability, and mini-dimensions for large, rapidly changing
dimensions.
Case16- Production
Case17- US Census
Recommended related readings (essential for DW professionals):
R. Kimball, M. Ross”The Data Warehouse Toolkit”3ed, Wiley 2013,
C. Adamson, M. Venerable “Data Warehouse Design Solutions” Wiley 1998;
C. Adamson “Mastering Data Warehouse Aggregates” Wiley 2006,
L. Solverston “The Data Model Resource Book” Vol 2, revised edition, Wiley 2001;
C. Imhoff, N. Galemmo, J. Geiger “Mastering Data Warehouse Design” Wiley 2003, (Calendar case).
Appendix
Instructions for the project work regarding Logical Modeling and Design
By now, you should have read M4 materials including some related readings and cases.
The task is to provide detail Logical Design as Dimensional Models for your term project in a fully
documented form for a design review.
The grade will be assessed, at the end of semester (using a model report card), so use the opportunity to
get feedback ASAP.
To prepare solid logical design on your project you have to specifically address dimensional dynamics
scenarios (Types I to IV) in the solution as well as to indicate views for materialization graphically (see
M4-LectureNotes for Erwin style sheets). All dimensions are to be confirmed (consolidated and fully de-
normalized across data marts at the team project level), and represented within an integrated model using
star schemas (snowflake schemas or a consolidated constellation schema) indicating all necessary
secondary facts/bridges, mini dimensions and secondary dimension outliers. Reflect on the MDM
discussion and compare confirmed dimensions with MDM stars. The project documentation should
include definitions of all entities and attributes, a note regarding query requirements from WIPs (with
reference to views for materialization, at the designed level of aggregation, represented in a model
graphically and not yet as SQL queries to create them). For feedback please e-mail using general
discussion area.
In order to fully prepare for this assignment it is not sufficient just to read chapters 8 and 9, it is necessary
to get exposure to practical in depth examples (as a substitute for real world experiences) via selected case
studies from the industry and/or professional literature (presented in M4). The aim of the project exercise
is to provide broad and realistic exposure to practical details such as degenerate dimensions, junk
dimensions, fact-less fact tables, snapshots of various kinds, bridge dimensions, secondary dimensions,
and in particular the application oriented views (those are views aimed at hiding design complexity from
the users). Related to this is the M4 Assignment 1 (really a preparatory practice which is probably best
performed in a distributed fashion, collated, reviewed and submitted as a team).
Note that this module requires above average study effort as it is itself equivalent to a few days of industrial type
DW design course such as Kimball’s dimensional modeling course. It is rather necessary for students to marry a
grounded academic approach (DFM and data vault/anchor model) with such industrial type practical in depth
experiences for a fully effective preparation for the real world of professional work in DW area and for the success
on the DW design term project.
For feedback submit the draft (work in progress) as M7-M4LDM-Team.docx and upload as
attachment.
Instructions for the project work regarding Logical Modeling and Design
By now, you should have read M4 materials including some related readings and cases.
The task is to provide detail Logical Design as Dimensional Models for your term project in a fully
documented form for a design review.
The grade will be assessed, at the end of semester (using a model report card), so use the opportunity to
get feedback ASAP.
To prepare solid logical design on your project you have to specifically address dimensional dynamics
scenarios (Types I to IV) in the solution as well as to indicate views for materialization graphically (see
M4-LectureNotes for Erwin style sheets). All dimensions are to be confirmed (consolidated and fully de-
normalized across data marts at the team project level), and represented within an integrated model using
star schemas (snowflake schemas or a consolidated constellation schema) indicating all necessary
secondary facts/bridges, mini dimensions and secondary dimension outliers. Reflect on the MDM
discussion and compare confirmed dimensions with MDM stars. The project documentation should
include definitions of all entities and attributes, a note regarding query requirements from WIPs (with
reference to views for materialization, at the designed level of aggregation, represented in a model
graphically and not yet as SQL queries to create them). For feedback please e-mail using general
discussion area.
In order to fully prepare for this assignment it is not sufficient just to read chapters 8 and 9, it is necessary
to get exposure to practical in depth examples (as a substitute for real world experiences) via selected case
studies from the industry and/or professional literature (presented in M4). The aim of the project exercise
is to provide broad and realistic exposure to practical details such as degenerate dimensions, junk
dimensions, fact-less fact tables, snapshots of various kinds, bridge dimensions, secondary dimensions,
and in particular the application oriented views (those are views aimed at hiding design complexity from
the users). Related to this is the M4 Assignment 1 (really a preparatory practice which is probably best
performed in a distributed fashion, collated, reviewed and submitted as a team).
Note that this module requires above average study effort as it is itself equivalent to a few days of industrial type
DW design course such as Kimball’s dimensional modeling course. It is rather necessary for students to marry a
grounded academic approach (DFM and data vault/anchor model) with such industrial type practical in depth
experiences for a fully effective preparation for the real world of professional work in DW area and for the success
on the DW design term project.
For feedback submit the draft (work in progress) as M7-M4LDM-Team.docx and upload as
attachment.
1 out of 25
Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
© 2024 | Zucol Services PVT LTD | All rights reserved.