The Distributed Data Warehouse

Verified

Added on 2024/04/03

AI Summary

This course delves into the concept of distributed data warehouses, exploring the different types such as local and global data warehouses. It covers the advantages, disadvantages, and the need for data discretization in the context of managing data across distributed environments.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.

The Distributed Data
Warehouse

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Distributed Data Warehouse
 Most organizations build and maintain a single
centralized data warehouse environment. This setup
makes sense for many reasons:
 The data in the warehouse is integrated across the
corporation, and an integrated view is used only at
headquarters.
 The corporation operates on a centralized business
model.
 The volume of data in the data warehouse is such
that a single centralized repository of data makes
sense.
 Even if data could be integrated, if it were dispersed
across multiple local sites, it would be cumbersome
to access.

Types of Distributed Data
Warehouses
The three types of distributed data warehouses are as follows:
1. Business is distributed geographically or over multiple, differing product lines. In this case
there is what can be called a local data warehouse and a globaldata warehouse. The
local data warehouse represents data and processing at a remote site, and the global
data warehouse represents that part of the business that is integrated across the
business.
2. The data warehouse environment willhold a lot of data, and the volume of data will be
distributed over multiple processors. Logically there is a single data warehouse, but physicall
there are many data warehouses that are all tightly related but reside on separate
processors. This configuration can be called the technologically distributed data wareho
3. The data warehouse environment grows up in an uncoordinated manner — first one data
warehouse appears, then another. The lack of coordination of the growth of the different
data warehouses is usually a result of politicaland organizational differences. This case can
be called the independently evolving distributed data warehouse.

Local and Global Data Warehouses
 When a corporation is spread around the world, information is needed both locally
and globally. The global needs for corporate information are met by a central data
warehouse where information is gathered. But there is also a need for a separate
data warehouse at each local organization — that is, in each country. In this case, a
distributed data warehouse is needed. Data will exist both centrally and in a
distributed manner.
 A second case for a local and global distributed data warehouse occurs when a
large corporation has many lines of business. Although there may be little or no
business integration among the different vertical lines of business, at the corporate
level — at least as far as finance is concerned — there is. The different lines of business
may not meet anywhere else but at the balance sheet, or there may be considerable
business integration, including such things as customers, products, vendors, and the
like. In this scenario, a corporate centralized data warehouse is supported by many
different data warehouses for each line of business.
 In some cases, part of the data warehouse exists centrally (that is, globally), and other
parts of the data warehouse exist in a distributed manner (that is, locally).

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

The Local Data Warehouse
A form of data warehouse, known as a local data
warehouse, contains data that is of interest only to the
local level. There might be a local data warehouse for
Brazil, one for France, and one for Hong Kong. Or there
might be a local data warehouse for car parts,
motorcycles, and heavy trucks. Each local data
warehouse has its own technology, its own data, its own
processor, and so forth. The local data warehouse
serves the same function that any other data
warehouse serves, except that the scope of the
data warehouse is local. For example, the data
warehouse for Brazildoes not have any information
about business activities in France, or the data
warehouse for car parts does not have any data
about motorcycles. In other words, the local data
warehouse contains data that is historicalin nature
and is integrated within the local site. There is no
coordination of data or structure of data from one
local data warehouse to another.

Local Data warehouse
 Activity appears at the local level
 Bulk of the operational processing
 Local site is autonomous
 Each local data warehouse has its unique architecture and contents of data
 The data is unique and of prime essential to that locality only
 Majority of the record is local and not replicated
 Any intersection of data between local data warehouses is circumstantial
 Local warehouse serves different technical communities
 The scope of the local data warehouses is finite to the local site
 Local warehouses also include historical data and are integrated only within the
local site.

GLOBAL DATA WAREHOUSE
 The global data warehouse contains information that must be integrated at the
corporate level.
 In many cases, this consists only of financialinformation.
 In other cases, this may mean integration of customer information, product
information, and so on.
 While a considerable amount of information willbe peculiar to and usefulto only
the local level, other corporate common information willneed to be shared and
managed corporately.
 The global data warehouse contains the data that needs to be managed
globally.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Disadvantages of Distributed Data
warehousing
 How frequently willthe transfer of data from the localenvironment to the global
environment be made? Daily? Weekly? Monthly? The rate of transferdepends
on a combination of factors. How quickly is the data needed in the global data
warehouse? How much activity has occurred at the locallevel? What volume of
data is being transported?
 Is the transportation of the data from the local environment to the global data
warehouse across nationallines legal?
 What network will be used to transport the data from the local environment to
the global environment? Is the Internet safe enough? Is it reliable enough? Can
the Internet safely transport enough data? What is the backup strategy? What
safeguards are in place to determine if allof the data has been passed?
 What safeguards are in place to determine whether data is being hacked during
transport from the localenvironment to the globalenvironment?

Data Discretization
 discretization is the process of transferring
continuous functions, models, variables, and
equations into discrete counterparts. This process is
usually carried out as a first step toward making
them suitable for numerical evaluation and
implementation on digital computers.
 Data discretization is defined as a process of
converting continuous data attribute values into a
finite set of intervals with minimal loss of information
and associating with each interval some specific
data value or conceptual labels.

Why is it needed?
 Improves the quality of discovered knowledge.
 Easy maintainability of the data.
 There is a necessity to use discretized data by many DM algorithms
which can only deal with discrete attributes.
 Reduces the running time of various data mining tasks such as
association rule discovery, classification, and prediction.
 Prepares data for further analysis, e.g., classification.
 Discretization is considered a data reduction mechanism because it
diminishes data from a large domain of numeric values to a subset
of categorical values.

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Steps of Discretization
 Step 1: Sorting the continuous values of the feature to be
 discretized.
 Step 2: Evaluating a cut point for splitting or adjacent
 intervals for merging.
 Step 3: Splitting or merging intervals of continuous values
 according to some defined criterion.
 Step 4: Stopping at some point.

Binning
 Binning is a data smoothing technique and its helps to group a
huge number of continuous values into a smaller number of bins. For
example, if we have data about a group of students, and we want
to arrange their marks into a smaller number of marks intervals by
making the bins of grades. One bin for grade A, one for grade B,
one for C, one for D, and one for F Grade.

 Suppose we have an attribute of Age with the given values
Age 1,5,9,4,7,11,14,17,13,18, 19,31,33,36,42,44,46,70,74,78,77
Table before Discretization
Attribute Age Age Age Age
1,5,4,9,7 11,14,17,13,18,19 31,33,36,42,44,46 70,74,77,78
After
Discretization
Child Young Mature Old