Challenges and Granularity in Business Intelligence Data Warehouse

Verified

Added on 2020/05/11

AI Summary

This report delves into critical aspects of business intelligence and data warehousing. It begins by examining the challenges encountered in creating a data warehouse, particularly focusing on real-time ETL processes and the issues arising from OLAP queries when dealing with changing data. The report then explores various techniques to overcome these challenges, such as "Near Real-time" ETL, direct trickle feed, and the use of external real-time data caches. Furthermore, it addresses the crucial topic of data granularity, discussing its significance in data warehouse design and implementation, and determining the appropriate level of granularity for different scenarios, weighing the trade-offs between storage overhead and data processing flexibility. The report concludes by emphasizing the importance of choosing the right granularity level based on specific requirements and scalability considerations, providing valuable insights for students and professionals in the field of data science and business intelligence.

Running Head: BUSINESS INTELLIGENCE
BUSINESS INTELLIGENCE
Name of the Student
Name of the University

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

1BUSINESS INTELLIGENCE
Task 1: Issues creating difficulty in the creation of Data Warehouse for the given scenario
Every organization now a day has stated utilizing the database as the centerpiece of their
gathering and storing the information for the organization. The idea of data warehousing is easily
understandable that is, extraction of data from one or more databases and load them to another
database for further analysis and usability. The data warehouse are generally designed to meet
several requirements like using of non-operational data, standardizing of data since mostly in
warehouse all the data comes from different other sources, it may be possible for the data to not
use the same units or definitions. In order to make this datasets match, there is a usability of
standard data format, this change in format is known to be extraction-transformation-load (ETL).
But sometime challenges occur like Enabling Real-time ETL (challenge 1). In Real-time
performing ETL can be a great challenge for the process of extraction, transforming, cleaning
and loading of data from source systems. All the tools and systems of ETL operate in batch mode
either based upon custom-coded or off-the-shelf products. There has been a typical involvement
of the downtime of warehouse, such that no users will be able to access while processing. Since
these heaps are normally performed late during the evening, this planned downtime ordinarily
does not burden numerous clients (Castellanos et al., 2015). When stacking information
consistently continuously, there can't be any framework downtime. There are additionally
methods for changing existing ETL frameworks to perform constant or close ongoing
distribution center stacking. Some of these apparatuses and systems are depicted beneath.
There are several techniques by which this issue can be sorted out:
 "Near Real-time" ETL
 Direct trickle feed

2BUSINESS INTELLIGENCE
 Trickle & Flip
 External Real-time Data Cache
The second challenges that can create a difficulty in the data warehouse are the OLAP Queries
vs. Changing Data. Query tools and OLAP were intended to work over perpetual, static
authentic information. Since they expect that the basic information isn't transforming, they don't
play it safe to guarantee that the outcomes they create are not adversely affected by information
changes simultaneous to question execution. Sometimes, this can prompt conflicting and
confounding inquiry comes about.
A multi-pass SQL proclamation is comprised of numerous littler SQL explanations that
consecutively work on an arrangement of impermanent tables. Relational OLAP tools are
especially touchy to this issue since they play out everything except the least complex
information investigation operations by issuing multi-pass SQL.
 The first issue is that the consequences of an inquiry that takes even one moment are
ostensibly not in real-time exactly.
 The second issue is that given the various goes of SQL required to perform any social
OLAP revealing or investigative operation, any constant stockroom is probably going to
experience the ill effects of the outcome set inward irregularity issue examined
previously.
The techniques that can be used to solve the issues:
 Usability of approach Near Real-time
 True Real-time Risk Mitigation
 Usage of an External Real-time Data Cache

3BUSINESS INTELLIGENCE
Task 2: The most appropriate level of granularity for data warehouse
The subject of granularity frequently comes up amid information distribution center plan,
and the appropriate response is regularly. The granularity of data refers to the size in which data
fields are sub-divided. The appropriate response relies upon your prerequisites. In the event that
the job needing to be done is to fabricate an Enterprise Data Warehouse to store chronicled
information and to answer each inquiry anybody may have, at that point yes, by all methods
influence it to low and put all that you can into it.
For the given scenario the appropriate level of granularity for our data warehouse will be
Higher granularity, that has overheads for the storage and the input data (Lv, Zhou & Zhao,
2017). This shows itself in a higher number of objects and strategies in the question situated
programming worldview or more subroutine calls for object oriented programming and parallel
figuring conditions. It does however offer advantages in adaptability of information handling in
treating every datum field in detachment if required. An execution issue caused by over the top
granularity may not uncover itself until the point when versatility turns into an issue. This may
help in locks of the database and may affect the concurrency. Thus, Adaptive Server helps in
supporting locking at the pages, tables and row levels. Like, a postal address can be recorded,
with coarse granularity, as a single field.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

4BUSINESS INTELLIGENCE
References
Bouadi, T., Cordier, M. O., Moreau, P., Quiniou, R., Salmon-Monviola, J., & Gascuel-Odoux, C.
(2017). A data warehouse to explore multidimensional simulated data from a spatially
distributed agro-hydrological model to improve catchment nitrogen management.
Environmental Modelling & Software, 97, 229-242.
Castellanos, M., Dayal, U., Pedersen, T. B., & Tatbul, N. (Eds.). (2015). Enabling Real-Time
Business Intelligence: International Workshops, BIRTE 2013, Riva Del Garda, Italy,
August 26, 2013, and BIRTE 2014, Hangzhou, China, September 1, 2014, Revised
Selected Papers (Vol. 206). Springer.
Chen, C. P., & Zhang, C. Y. (2014). Data-intensive applications, challenges, techniques and
technologies: A survey on Big Data. Information Sciences, 275, 314-347.
Geary, N., Jarvis, B., Mew, C., & Gore, H. (2017). U.S. Patent No. 9,684,703. Washington, DC:
U.S. Patent and Trademark Office.
Kimball, R., & Ross, M. (2013). The data warehouse toolkit: The definitive guide to dimensional
modeling. John Wiley & Sons.
Lv, H., Zhou, L., & Zhao, Y. (2017, August). Classification of Data Granularity in Data
Warehouse. In Intelligent Human-Machine Systems and Cybernetics (IHMSC), 2017 9th
International Conference on (Vol. 2, pp. 118-122). IEEE.

5BUSINESS INTELLIGENCE
Meehan, J., Zdonik, S., Tian, S., Tian, Y., Tatbul, N., Dziedzic, A., & Elmore, A. (2016,
September). Integrating real-time and batch processing in a polystore. In High
Performance Extreme Computing Conference (HPEC), 2016 IEEE (pp. 1-7). IEEE.
Mireku Kwakye, M. (2017). Modelling and Design of Generic Semantic Trajectory Data
Warehouse. Science.
Narra, L., Sahama, T., & Stapleton, P. (2015). Clinical data warehousing: A business analytics
approach for managing health data. In Proceedings of the Eighth Australasian Workshop
on Health Informatics and Knowledge Management (HIKM2015). Australian Computer
Society.
Rashmi, K. V., Shah, N. B., Gu, D., Kuang, H., Borthakur, D., & Ramchandran, K. (2013, June).
A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed
Storage Systems: A Study on the Facebook Warehouse Cluster. In HotStorage.
Renso, C., Roncato, A., & Trasarti, R. (2014, December). Mob-Warehouse: A semantic approach
for mobility analysis with a Trajectory Data Warehouse. In Advances in Conceptual
Modeling: ER 2013 Workshops, LSAWM, MoBiD, RIGiM, SeCoGIS, WISM, DaSeM,
SCME, and PhD Symposium, Hong Kong, China, November 11-13, 2013, Revised
Selected Papers (Vol. 8697, p. 127). Springer.
Vaisman, A., & Zimányi, E. (2014). Data Warehouse Systems: Design and Implementation.
Springer.