INF30004: ETL Process Importance, Challenges, and Data Stewardship

Verified

Added on  2022/09/23

|3
|1075
|32
Report
AI Summary
This report provides a comprehensive analysis of the ETL (Extract, Transform, Load) process, a critical component of data warehousing and business intelligence. It begins by defining ETL and its role in extracting data from various sources, transforming it into a usable format, and loading it into a data warehouse. The report emphasizes the importance of ETL in enabling effective business analysis and decision-making. It explores the challenges associated with ETL implementation, such as the resource-intensive nature of the process, the need for specialized tools, and the complexities of scheduling and tracking data loading. The report also highlights the role of data stewardship in ensuring data quality, managing data sources, and enforcing data governance policies within the data warehouse environment. Finally, it discusses the potential problems that may be encountered while performing ETL within a specific business context, such as the Sunshine Group, and underscores the need for careful planning and execution to ensure the success of ETL projects. The report is based on the INF30004 course assignment and provides a detailed overview of the ETL process and its significance in data management.
tabler-icon-diamond-filled.svg

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
<Last Name> 1
<Student’s Name>
<Instructor’s Name>
<Course Name>
31 August 2024
Business Intelligence and Data Visualization
Task D (20%)
Importance and need for the ETL process
Generally, ETL is defined as the actions taken to extract data from a given source system
and upload it into the data warehouse and it is abbreviated from data extraction,
transformation, and loading. However, ETL also entails transporting datasets.
As much as ETL methods and tasks have been in existence for several years, hence it is
not a unique aspect to data warehouse environments. On this note, ETL borrows a lot
from the IT sector while using several database packages given that data sharing is
between systems or applications by integrating data to give different pictures to the view
of the world.
To enhance business analysis regularly, data responsible persons are expected to ensure
that the data is loaded into the warehouse for it to meets its desired goals in decision-
making processes, (Molina-Solana, et, al, 2017). This is only possible when data from
different sources are extracted and copied in any given data warehouse. However, one of
the limitations that have been witnessed overtime in the warehouse is the integration,
rearrangement and consolidation of large datasets within the systems hence the provision
of information base which is unified for business intelligence.
Besides, extracting data occurs when the data points of interest from several sources are
normally extracted together with the database systems and other applications, (Leskovec,
Rajaraman, & Ullman, 2020). To note, specific data of interest may not be extracted but
instead, several data points are extracted so that the identification of data points desired is
done at a later stage. During data extraction, some data transformation may also take
place in the process, but this is the high dependence of the operating system resources.
Also, based on the business size, extracted data can have a variation in its size ranging
from hundreds of kilobytes up to gigabytes. Similarly, the process of extracting data may
also vary in terms of time taken ranging from days/hours and minutes to near real-time.
On the same, the log files for different Web servers may also increase to hundreds of
megabytes within a short period.
Before data preprocessing can take place, it must be transported after extraction to
systems of the target. Consistently, some data transformation can also occur during data
transportation. For instance, using SQL statement which in one way or the other access
data from a remote target system through different gateways and in the process
concatenates two columns as part of the SELECT statement, (Spyker, Szabo, & Yao,
2019).
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
<Last Name> 2
Potential problems that may be encountered performing ETL within the Sunshine
Group
Well, one of the observed challenges is seen in the processes of designing and
maintaining the ETL especially its resource-intensive nature while working with data
warehouse projects.
The ETL process is extensive especially when supporting the extraction of data,
transformations of data, and loading dataset. Also, the process requires additional tasks to
observe a successful ETL especially in its implementation which is a core aspect of a data
warehouse and this requires extra support to enhance the ETL process. Also, designing a
data warehouse and the data flow requires specific ETL tools such as OWB which might
not be available at the time of need hence leading to delay in the general process.
Using other tools such as Oracle limits the production of complete solutions for ETL
hence requiring other additional tools to customize ETL solutions and this at times makes
the process complex.
Specific orders must be followed while scheduling and processing data loadings and
transformations. All the results must be tracked to detect ETL success and failures as far
as the operations are concerned and even to some extent new operations must be restarted
and this makes the process difficult. ETL process requires other tools like Oracle
Warehouse Builder to define the business working flow of the operations and this is
making the process inadequate without these extra tools.
Building and keeping high levels of trust as far as the information from the warehouse is
concerned, reconstructions must be done to the individual records at any given point of
time and for the future happenings.
Role of data stewardship in the data warehouse environment.
Data stewardship involves stewards that are tasked to define, develop and implement
working policies and operating procedures for the daily administrative and operational
data management and systems. These include but not limited to data storage, intake,
processing, and transmission to both internal and external systems. Role of data
stewardship in the data warehouse environment include management of data from
different sources of varieties. In this role, data stewards ensure that data source is
managed against data insecurity measures including poor storage. Furthermore, quality of
the collected dataset, storing and data utilization is ensured by the stewardship
departments. Again, Role of data stewardship in the data warehouse environment include
data documentation and adhering to rules right from data collection, storage and use. All
policies and rules surrounding data utilization are enforced by the data stewards. In
summary, data stewards are responsible for the management of all data within and used
by the enterprise and ensuring that the data-related rules as established by the data
governance program are followed and maintained.
Document Page
<Last Name> 3
Works Cited
Leskovec, Jure, Anand Rajaraman, and Jeffrey David Ullman. Mining of massive data
sets. Cambridge university press, 2020.
Molina-Solana, Miguel, et al. "Data science for building energy management: A
review." Renewable and Sustainable Energy Reviews 70 (2017): 598-609.
Spyker, James D., Victor L. Szabo, and Yongfeng Yao. "Replicating structured query
language (SQL) in a heterogeneous replication environment." U.S. Patent No.
10,366,105. 30 Jul. 2019.
chevron_up_icon
1 out of 3
circle_padding
hide_on_mobile
zoom_out_icon
logo.png

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]