Wine Cellar Database: Database Technologies and Recommendations

Verified

Added on 2019/10/09

AI Summary

This report, prepared for a university module, examines database technologies and provides recommendations for The Wine Cellar, a growing wine and accessories company. The report begins by discussing the limitations of relational databases in the context of the company's expansion and the need for improved reporting. It then explores data warehousing, contrasting OLTP and OLAP systems and highlighting the benefits of OLAP for decision-making. The report also delves into the challenges of designing and implementing a data warehouse, including data quality and the choice of data models. Furthermore, the report addresses distributed databases, evaluating their relevance to The Wine Cellar and proposing a suitable type. Finally, it examines how relational databases support object-oriented development and multimedia requirements, concluding with recommendations for the company's database strategy. The report emphasizes the importance of efficient data storage and retrieval for supporting daily operations and strategic decision-making in a competitive business environment.

<University>
EXPLANATIONS AND RECOMMENDATIONS FOR
THE WINE CELLAR DATABASE
by
Student Number:
Module Name:
Submission Deadline:
15th March 2017
<Lecturer’s Name and Course Number>
 <Your Name> <Your Student Number> 2017 1 of 23

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Table of Contents
Task 1.............................................................................................................................5
a) Limitations of Relational Databases for The Wine Cellar Scenario......................5
b) Data Warehouse and Their Support in Decision-Making......................................6
c) Issues in Designing and Implementing a Data Warehouse....................................8
d) Data Structure Model for The Wine Cellar Scenario.............................................9
Task 2.............................................................................................................................9
a) Distributed Databases and Potential Issues Before Final Design Decisions..........9
b) Relevance of Distributed Database for The Wine Cellar Organization and
Costs/Benefits...........................................................................................................12
c) Recommendation of Type of Distributed Database and Justifications................13
Task 3...........................................................................................................................13
a) How Relational Databases Support Object Oriented Development?...................13
b) Relational Databases Relevance for Multimedia Requirements in The Wine
Cellar Scenario.........................................................................................................16
Conclusion....................................................................................................................17
References....................................................................................................................19
 <Your Name> <Your Student Number> 2017 2 of 23

Data is the oil that runs the machinery of the present-day society. Data is everywhere,
and new is being generated every moment from many sources. In the present-day
competitive world of advanced technologies, databases and database technology are
vital to organizations for supporting the daily operations, as well as the strategic
decision making. Being able to store and retrieve data in an efficient way is an
essential requirement for the success of any organisation. Also, different databases are
optimised for different roles. Some may be optimised for read-heavy operations, while
others may be optimised for write-heavy operations. Optimising one parameter entails
a drawback in other parameters. Also, the underlying structure of a database system
has a significant bearing on the applications. This paper deals with the new
requirements of a (fictitious) company in England, which is engaged in sourcing wine
and accessories from all over the world and selling them. The company has
experienced growth in the recent years and has opened branches throughout the
country. However, the database handling is still keeping in mind a single location.
The consequence has been the delay and manual effort required in compiling
company-wide reports. In addition to the reporting requirement, the company also
wants to promote selected products via short videos on a website. This paper details
the various database technologies and provides recommendations for the new
scenarios of reporting and multimedia promotions which have cropped up for The
 <Your Name> <Your Student Number> 2017 3 of 23

Wine Cellar.
 <Your Name> <Your Student Number> 2017 4 of 23

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Task 1
a) Limitations of Relational Databases for The Wine Cellar Scenario
In the relational database paradigm, data is represented in the form of tables, and
relations among the tables help enforce data consistency (Rouse 2006). In the process
of developing relational database schema for a scenario, the entities to be represented
in the database are identified first. Then, the parameters of these entities are
recognised and finally the relations among them, so that there is no duplication of data
and any information about any entity is linked and can be traced. Such a paradigm
allows for naturally enforcing constraints. This system was proposed in the 70s by
E.F. Codd (Codd 1982, pp.109-117). Relational Database is implemented via tables,
columns, rows, primary keys, and foreign keys. In other words, tables represent
entities (e.g. products, customers), columns represent parameters about the entities
(e.g. product name, manufacturing date, volume). The rows represent one particular
instance of the entity (e.g. all the details of a single person), a primary key uniquely
identifies each row in a table (e.g. social security number), and foreign keys map one
row to another to implement relations.
The key point to note from the preceding discussion is the foundation of the paradigm
is relations, and indeed the relational database management system (RDBMS) are
based on mathematical theories (Duffy 2011). Thus these systems are good at
enforcing data consistency, both at the individual column level and in between tables.
However, such systems are traditionally centralised, and this has both advantages and
disadvantages. Till a few years, the centralised system was serving The Wine Cellar
well, but the geographical distribution of the sources of data to different outlets call
for another solution. By centralisation, we mean that the entire database system - the
software to manage the data, the computer hardware, and the data are located on a
single machine in a single location. This type of a concentrated architecture has some
advantages and disadvantages. As we will see in a moment, the centralised systems
serve the outlets of The Wine Cellar well but fail when it comes to company-wide
manoeuvring.
 <Your Name> <Your Student Number> 2017 5 of 23

The current scenario of the company is that the outlets manage their data in a
centralised relational database system e.g. inventory, sales, tax, customer details and
other data. This centralisation serves them well as a centralised system has the
advantages of being faster, cheaper, and being easier to manage. However, this makes
interaction, specifically programmatic interaction with the head office a difficult
thing. The senior staff at the head office would be interested in knowing the up-to-
minute performance of their various branches and make strategic and tactical
decisions accordingly. However, as confirmed in the briefing for this paper, the
outlets submit a file each week which has to be manually analysed and results
compiled for the perusal of the senior staff. This process is cumbersome, slow and
prone to errors. In a nutshell, the centralisation of traditional relational database
systems and the consequent inability to allow access to multiple (physically separated)
parties is a limitation in the context of The Wine Cellar's growth. This situation is a
typical one in that the databases end up becoming islands of information with even
one department's system not being able to talk to another department (Martin n.d.).
b) Data Warehouse and Their Support in Decision-Making
Databases and database technology are vital to modern organizations for supporting
the daily operations, as well as the strategic decision making (Mannino, n.d.).We will
now explore two types of database systems - OLTP (Online Transaction Processing)
and OLAP (Online Analytical Processing), which have emerged due to the different
requirements and expectations from data within an organisation. Imagine a lake in
which water flows into from different streams. After a while of the streams pouring
their water into the lake, the lake begins to fill and will soon overflow. However,
before that happens, the deeper water is taken out and placed in a separate well which
has been designed to preserve this water. Now, in this comparison, the streams are the
routine transactions that are taking place regularly e.g. thousands of bank transactions
take place in a banking system every minute. The lake is the transactional database
that handles the operations and is a reflection of the current state of the organisation
(OLTP database). The movement of old data into a system which is specifically made
for maintaining archives is the OLAP database. OLTP is also referred to as
 <Your Name> <Your Student Number> 2017 6 of 23

operational data stores. OLAP is also referred to as a data warehouse. A transactional
database by its very nature does not make itself amenable to analytics (Cardon n.d.)
and thus is born the need for OLAP systems. It must be emphasised that OLTP is not
suitable for business intelligence or analytics (Why Business Intelligence on OLTP is
evil n.d.).
Technically, the OLTP has different goals that OLAP. OLTP is designed to support
the running of the organisation and performance is the main criteria. Its main focus is
on updates, which are short but multiple and are triggered by end-users. The queries
are standardised and few in variety. Here, the data consistency is paramount, and
usually high normalisation of data is maintained to enforce consistency and integrity
of data. OLTP is the storehouse of the original data. In contrast, OLAP is an archive
of historical data i.e. transactions that have taken place and are not likely to edited
further. Its focus is on retrieval (OLAP vs OLTP: what makes the difference n.d.).
Also, this consolidated data of OLAP comes from multiple places, and this
multiplicity of the sources often dictates that a separate step be executed (a staging
area) where the data is treated, and the format is modified before being saved in the
OLAP database. This operation is often referred to as ETL (Extract, Transform,
Load). Since OLAP deals with historical data (which is not expected to change), the
priority of the system and the scope of its service is poles apart from the transactional
one. The priority of an OLAP system is supporting complex and non-typical queries
in an ad-hoc manner, some of which are often long running. Many common and long-
running queries are executed in advance to precompute the reports that may be later
requested. Its scope is to help with planning, problem-solving and supporting
decision-making. Another aspect of OLAP is that data normalisation is dropped in
favour of data redundancy and simpler associations so that complex queries can be
executed faster.
Now, an OLAP or a data warehouse is characterised by being subject-oriented to a
particular metric (e.g. sales), integrated (using multiple sources of data), non-volatile
(data once entered into warehouse is not likely to be changed or deleted), and time
variant (being able to analyze how a metric has changed over time e.g. months or
 <Your Name> <Your Student Number> 2017 7 of 23

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

years) (Lane and Schupmann 2002). This availability of an ocean of data which is
disjoint from the transactional data, coupled with specialised tools, allows managers
to look into the past and ask questions to mine data, extract trends, see how metrics
are moving up or down, and then decide what is in the best interests of their
organisation. Such an arrangement also eases the database managers' responsibilities.
c) Issues in Designing and Implementing a Data Warehouse
As detailed in the previous section, a data warehouse is an ideal system to help
management look into the past and come up with answers to questions like the change
in a metric over time, hidden trends and other data mining questions. Still, the process
of establishing and maintaining a data warehouse is non-trivial. In this section, we
will talk about the challenges encountered in the implementation of a data warehouse.
One issue is of the data quality. In a data warehouse, the source of data are multiple
and not directly the end-users. Also, to help meet a data warehouse's goal of
uncovering significant trends, we cannot skip on data sources. Now, whenever
different sources are merged, errors happen in data due to the different format, the
level of quality, logic conflicts, missing data (Wentzlaff 2014). The reasons for the
low quality of data at the sources include incorrect information entered in the primary
source, incomplete knowledge of interdependencies among the data sources, inability
to cope with the ageing data, varying timelines of the data sources, and others (Pandey
2014, pp.18-24). Some practitioners suggest that not all low-quality data needs to be
cleaned. Sometimes, acknowledging the bounds of clean data is prudent (McKnight
2007). Another concern is the model to be chosen for the data warehouse. As pointed
out earlier, since the goals of the transactional database and a warehouse are poles
apart, therefore the underlying structure cannot be same. Thus the data models for
transactional are not suited for warehousing environments (Becker 2002, pp.22-76). A
multidimensional data model (MDDM) storing data as facts and dimensions as
opposed to rows and columns of a relational database are more suited to handle the
scope of a data warehouse - ad-hoc and complex queries joining a great many
numbers of entities. The schemas common are star schema and snowflake schema
 <Your Name> <Your Student Number> 2017 8 of 23

(Nepomnjashiy 2001). Then, there are some implementation issues which need to be
sorted out, like the amount of data that will be shipped around the network and the
network's capability to cope with it, the amount of disk space required, and its speed.
Also, expenses need to be tailored to the goals in a decision like solid-state drives
(SSDs) or traditional hard disk drives (Whitehorn 2011).
d) Data Structure Model for The Wine Cellar Scenario
A generic data warehouse architecture consists of three layers - data sources, data
storage area, and primary data warehouse (El-Sappagh, Hendawi, & El Bastawissy,
2011). This section of the report will look into the models for data warehouse that can
support the scenario in question. There are three types of schema and the application
of any one of them depends upon the scenario being handled. These schemas are star,
snowflake, and fact constellation schema. Star schemas are simple, and usually in
Third Normal Form (3NF), and thus present a single view of truth (Williams, n.d.). A
snowflake schema is a more complex version of the star model. Finally, the fact
constellation schema is the most complex of the three, and it is developed by splitting
a star schema into multiple star schemas by bringing in other dimensions.
As the problem in hand is a basic one, and only a small section of the problem set is
being considered (as in a data mart). Data marts are of interest to a set of users, or a
particular department of the organisation. For data marts, star schemas are
recommended (Drkušić, 2016). Thus, we will go with the star schema and present our
schema as follows:
Task 2
a) Distributed Databases and Potential Issues Before Final Design Decisions
 <Your Name> <Your Student Number> 2017 9 of 23

A distributed database is in contrast to a centralised database. A database which is
implemented across multiple computers, which may be spread apart geographically is
a distributed database system. Such systems are connected by a network (e.g.
Internet). Depending upon the underlying hardware and software, there are two types
of distributed systems - homogeneous and heterogeneous. Before we move further, it
is relevant to bear in mind that for the user, this distribution is transparent. For a
distributed database, this location transparency exists for the database administrators
also. As an example, consider email service by Google, Gmail. A Gmail user does not
know where in the world and on which servers his emails are being stored and
processed. We note that the data itself, as well as the processing required to serve a
query, may be spread across the multiple systems (which may even be in different
continents or across oceans) which are part of the distributed database. The sites
which make up a distributed database do not share any physical components and run
independently of each other. Some authors believe that the distribution of the
processing load among multiple computers may improve an end user's performance.
A homogenous distributed database system is one in which all participating sites have
the same software, are aware of each other and share the workload of processing user
requests. All of them agree that they will not change the schema or software without a
consensus. In contrast, in a heterogeneous distributed database system, the
participating sites have the freedom to use different software. They are also free to use
different schemas for the databases. As expected, such a difference gives rise to
technical challenges. Also, the sites may not be aware of each other and may
cooperate in a limited way only. Now, this distribution of data among the underlying
worker machines introduces the problems of data consistency and accuracy in the face
of physically separate computers which are possibly running different software and
schemas.
As is evident from the discussion, maintaining data consistency among the different
sites is a technical challenge. Techniques have been devised to ensure this and include
replication, fragmentation, or a combination of both. Replication is creating an exact
copy of the database at the various sites. Fragmentation is partitioning the database so
 <Your Name> <Your Student Number> 2017 10 of 23

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

that separate parts of the database are kept in separate sites. Fragmentation may be
horizontal (row-wise) or vertical (column-wise). Retrieving a complete record is then
a matter of querying the database sites concerned, compiling the results and
presenting to the user.
Distributed systems offer advantages, and they are more suited to certain scenarios.
Distributed systems mimic the organisational units as well as the physical separation
of the various branches of an organisation. Such sites also need to share data among
themselves. As in the case of The Wine Cellar, the company after experiencing
growth is facing delays and inefficiencies in retrieving data from the different outlets
because of the centralised nature of traditional database systems. Distributed systems
provide the hope of being able to remedy this. Furthermore, such databases support
transactional as well as data warehouse technologies. Such a database system is more
resilient to software/hardware failures as the number of locations where the data in
preserved is multiple, thereby eliminating any single point of failure. Additionally, the
data quality constraints of ACID (Atomicity, Consistency, Isolation, Durability) can
be preserved, though this requires technical prowess (Erb 2012).
However, distributed databases are a non-trivial endeavour, and there are some issues.
It would be beneficial if The Wine Cellar could work out some issues before
finalising the design for a distributed database. These systems are complex and extra
effort must be expended along with specialised skills to establish and maintain. The
company needs to decide if this is something it is willing to commit to right now or in
the future. Since a significant amount of data, which is proprietary will be moving
through the network, expenses must be incurred in bandwidth as well as securing the
line of communication (e.g. Virtual Private Network (VPN) over the Internet).
Technical problems would be the domain of professional database and network
managers. The company need to commit itself to the ongoing expenses and the
arrangement of necessary skills to manage the new IT system. This fulfilment of the
skills may be in the form of new hires or upgrading the skills of the existing staff.
 <Your Name> <Your Student Number> 2017 11 of 23

b) Relevance of Distributed Database for The Wine Cellar Organization and
Costs/Benefits
We opened this paper with a discussion of a centralised database system and what are
its advantages and disadvantages. As far as The Wine Cellar is concerned, if it had a
single branch, then the centralised database would have sufficed well. We are here to
help the organisation manage its growth. The company now has many outlets and
would understandably want to keep tabs on each and every outlet, not only for
monitoring purposes but also to uncover trends to help make strategic decisions. It is
our recommendation that distributed databases be considered for the new scenario of
the company. The relevance and costs/benefits of such a recommendation are detailed
next.
The company has grown from a single site to multiple sites which are located
throughout England. The IT staff that was mobilised initially was keeping in mind the
single-site nature of the business. Now, new branches have been opened, and each of
them has their local data to maintain, besides keeping the head office in the loop.
Currently, a workaround with weekly updates of files, and then a later manual
analysis has been put in place. This is labour-intensive and prone to errors. We find
that a distributed database would be better suited to handle the new scenario as well as
help the company in its growth by scaling when the number of branches and the
workload grows further. A distributed database system will provide the company
access to the latest activity and status of all data of every outlet. The benefit for the
head office is that they can run queries at any time of the day to find the information
they need. Also, the manual movement of files will be done away with and the staff
thus occupied can be used for other tasks. Authorisation controls can be put in place
so that the outlets cannot access the head-office data or one another's data, while the
head-office has free access to all data. As detailed earlier, technical challenges are the
domains of experts and those skills will have to arranged and paid for. Also, ongoing
costs will be introduced for the maintenance of the underlying infrastructure. In the
present-day business, information is power, and any delay in actionable information
can hurt business goals. It is our recommendation that company approve the expenses
 <Your Name> <Your Student Number> 2017 12 of 23