Data Mining, Data Dictionaries and Management: BUS105 Assignment

Verified

Added on  2023/04/21

|9
|1736
|235
Homework Assignment
AI Summary
This assignment solution for BUS105, Business Information Systems, addresses data mining concepts, data dictionaries, and data management. Part A focuses on theoretical aspects, explaining the CRISP-DM process, its phases, and the importance of data mining in business, including accuracy, relevancy, and specificity. It also identifies common problems in data mining across various industries. Part B involves a practical analysis of the 'books_unique_weeks.xlsx' file, demonstrating data management techniques such as sorting, pivoting, and creating visualizations to identify best-selling authors and the popularity of their books using column charts. The solution provides a comprehensive overview of data mining methodologies and their application in business contexts.
Document Page
Running head: BUSINESS INFORMATION SYSTEMS
BUSINESS INFORMATION SYSTEMS
Name of the Student
Name of the University
Author Note
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
1BUSINESS INFORMATION SYSTEMS
Part A:
Question 1:
The data mining is a technique for examining huge databases for extracting patterns or new
information from those. The CRISP-DM is the abbreviation of Cross-Industry-Standard
Process for Data Mining is the efficient data mining guiding process which was proved
beneficial in various industries (The-modeling-agency.com, 2019). The life cycle of data
mining project contains six phases, respective tasks of each phase and the relationships
between the tasks. The six phases of CRISP-DM process is shown by the following figure.
The phase sequence are not very much rigid and the back and forth movement between the
different phases is always the requirement. The result of one phase indicates the phase that is
going to be performed (Ibm.com, 2019). The dependency of most significant and frequent
phases are indicated by the arrows in above figure. The phases are
Business understanding: Understanding the requirement and objectives of a business and
conversion of those knowledge into the data mining problem and planning for solving or
finding the solutions is performed in the initial phase of data mining.
Data understanding: This phase involves data collection and the collected data is then
explored and familiarized by different visualizations, the quality problems are then identified
Document Page
2BUSINESS INFORMATION SYSTEMS
and insights of the data is taken into account. The interesting patterns are also detected in
order to gain hypothesis of unknown information.
Data preparation: The activities that are needed to be performed to form the final datasheet
which is fed into the modelling tools are done in this phase (The-modeling-agency.com,
2019). The tasks in this phase are marinating record, table and selection of attributes,
transformation and filtration of data.
Modelling: In this particular phase different modelling techniques are chosen and then
applied to find the best model for the data.
Evaluation: The results from the developed models in the previous phase are evaluated and it
is checked that the results answering the objectives or requirements of business. Also, the
level confidence of the obtained results are also considered to estimate how much the model/s
explain the business objectives.
Deployment: Now, in the final phase the knowledge obtained from the results of developed
model/s is organized and then presented in some way so that it is usable by the customers.
This mainly involves application real models inside the decision making process of an
organization.
Question 2:
The data mining in business or other industries is very much important because it helps for
informative decision making process, provides detail insights of data and translates
information into business intelligence. The three elements of data mining is
1. Accuracy: The solution of the data mining process will be appropriate if the values are
of good quality. It is required that the sources from which the data is collected is
accurate or the conclusions made from the data mining will be erroneous (Cerami,
Document Page
3BUSINESS INFORMATION SYSTEMS
2019). As an example for a retailer who wants to gather variety of pricing for his
competitors, the data collection tool must be able to separate different types of data
based on season, source and the product and then the collected data will be accurate.
2. Relevancy: The context of data set is needed to be collected along with data. The
three types of context specified by IBM are industry, data and transfer. Hence, the
data mining tools must be able to automatically detect the context of data at the time
of collection (Cerami, 2019). For example from a collected dataset containing
customer IDs, sales and location of customers the customer IDs must be removed and
then the sales need to segmented based on regions of customers to answer the
business objective of customers in which region buys most of the company products.
3. Specificity: There are quintillions bytes of data are generating in every day but all
data is not needed for a particular business to meet their specific objectives (Cerami,
2019). Hence, there should be specificity of data collection while collecting data using
data mining tool.
Question 3:
The common problems in data mining in various industries are poor quality of data, the
extracted data from different sources conflicts with each other, concerns of security and
privacy of individuals, government and organization, inaccessibility of data or difficult
accessibility, finding the most efficient and scalable algorithm for large data mining,
handling dynamic, cost sensitive and unbalanced data, structuring of large and complex
data (Big Data Made Simple, 2019).
Part B: Data Management
In this task the books_unique_weeks.xlsx file is accessed and analysed in the following
procedure.
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
4BUSINESS INFORMATION SYSTEMS
The data dictionary different types of data in the file corresponding to each column is
shown below.
Field Name Data Type Description Example
Publisher Text Publisher
name of the
book
HarperCollins
Publishers
Author Text Authors of the
book
Gregory
Maguire
Primary
isbn10
Numeric ISBN number
of the book
60548940
date date Date when the
name of book
is entered in
list
11/20/11
Book title Text Title of the
book
OUT OF OZ
Weeks on
list
numeric The number
of weeks the
book is on the
list
0
Now, the books are sorted by means of highest to lowest of week they are in best seller list.
The best 10 ten books with the corresponding Author Name is shown below.
Document Page
5BUSINESS INFORMATION SYSTEMS
Publisher Author Primary isbn10 date Book title Weeks on list
Riverhead Paula Hawkins 1594634025 2/19/17 THE GIRL ON THE TRAIN 102
Scribner Anthony Doerr 1501173219 05-07-2017 ALL THE LIGHT WE CANNOT SEE 81
Vintage E L James 525431888 03-05-2017 FIFTY SHADES DARKER 66
St. Martin's Kristin Hannah 1466850604 10/29/17 THE NIGHTINGALE 63
Penguin Group Kathryn Stockett 1440697663 04-08-2012 THE HELP 58
Washington SquareFredrik Backman 1476738025 7/23/17 A MAN CALLED OVE 56
Andrews McMeelRupi Kaur 144947425X 03-11-2018 MILK AND HONEY 52
Bantam George R R Martin 553897845 9/17/17 A GAME OF THRONES 51
Berkley Liane Moriarty 425274861 5/21/17 BIG LITTLE LIES 38
Ballantine Lisa Wingate 425284697 06-09-2018 BEFORE WE WERE YOURS 35
Now, the data set is sorted based Author name in Alphabetical order starting from A. The
first 500 lines are considered starting from cell 2 to 501. Now, the Authors with more than 10
books in the best seller’s list is found using pivoting and then shown below.
Author Count of
Books
Danielle Steel 32
Christine
Feehan
28
Debbie
Macomber
26
David Baldacci 20
Dean Koontz 17
Christina Lauren 13
Now, the sum of the number of weeks the books were on the best seller list is also found by
pivoting as shown below.
Author Count of Sum of Weeks on
Document Page
6BUSINESS INFORMATION SYSTEMS
Books list
Danielle Steel 32 38
Christine
Feehan
28 31
Debbie
Macomber
26 23
David Baldacci 20 121
Dean Koontz 17 14
Christina Lauren 13 8
Now, the average number of weeks the books are on the list is shown below.
Author Count of
Books
Sum of Weeks on
list
Average of Weeks on
list
Danielle Steel 32 38 1.1875
Christine
Feehan
28 31 1.107142857
Debbie
Macomber
26 23 0.884615385
David Baldacci 20 121 6.05
Dean Koontz 17 14 0.823529412
Christina Lauren 13 8 0.615384615
Now, the author named Danielle Steel is considered and all the books and the number of
weeks those books are on the best seller’s list shown below by a column chart.
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
7BUSINESS INFORMATION SYSTEMS
THE APARTMENT
THE MISTRESS
THE DUCHESS
FALL FROM GRACE
FAIRYTALE
UNTIL THE END OF TIME
AGAINST ALL ODDS
PAST PERFECT
HOTEL VENDOME
PRECIOUS GIFTS
FRIENDS FOREVER
RUSHING WATERS
PROPERTY OF A NOBLEWOMAN
PRODIGAL SON
COUNTRY
THE SINS OF THE MOTHER
0
1
2
3
4
5
6
Weeks on list
Book Title
Weeks on list
From the above column chart it is clear that the most famous books of Danielle Steel are THE
APARTMENT and THE RIGHT TIME which were in the best seller’s list for 5 weeks.
Similarly, the popularity of the books of top 10 author’s in the list can be found using column
charts. Hence, the processes followed in the part B is known as different stages of the data
mining process by which the business question the top 10 famous authors in the best seller’s
list is found and the popularity of their books is graphically represented by means of column
chart as shown above.
Document Page
8BUSINESS INFORMATION SYSTEMS
References:
Big Data Made Simple. (2019). 12 common problems in Data Mining. [online] Available at:
http://bigdata-madesimple.com/12-common-problems-in-data-mining/ [Accessed 5 Jan.
2019].
Cerami, G. (2019). 3 Major Elements of Data Mining | Connotate. [online] Connotate.
Available at: https://www.connotate.com/three-major-elements-data-mining/ [Accessed 5
Jan. 2019].
Ibm.com. (2019). IBM Knowledge Center. [online] Available at:
https://www.ibm.com/support/knowledgecenter/en/SS3RA7_15.0.0/com.ibm.spss.crispdm.he
lp/crisp_overview.htm [Accessed 5 Jan. 2019].
The-modeling-agency.com. (2019). [online] Available at: https://www.the-modeling-
agency.com/crisp-dm.pdf [Accessed 5 Jan. 2019].
chevron_up_icon
1 out of 9
circle_padding
hide_on_mobile
zoom_out_icon
logo.png

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]