BUS105 - Data Mining, CRISP-DM, and Problems in Business Systems

Verified

Added on  2023/05/27

|36
|6730
|179
Report
AI Summary
This report provides an overview of data mining within the context of business information systems. It explains data mining, detailing the six phases of the CRISP-DM methodology: business understanding, data understanding, data preparation, modeling, evaluation, and deployment. The report highlights the importance of data mining, emphasizing accuracy, relevancy, and specificity as key elements. It also addresses common problems encountered in data mining, such as data unavailability, managing large databases, and poor data quality. Additionally, the report includes findings from an analysis of popular authors and their book performance, using data to illustrate practical applications of data mining principles. This document is available on Desklib, a platform offering a variety of study tools and resources for students.
Document Page
BUSINESS INFORMATION SYSTEMS
Business Information Systems
MUHAMMAD SULEMAN
KAPLAN BUSINESS SCHOOL
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
BUSINESS INFORMATION SYSTEMS
1
Table of Contents
Part A:........................................................................................................................................2
Question 1: Explaining data mining and describing each of the six phases of CRISP-
DM..............................................................................................................................................2
Question 2: Explaining why data mining is important and describing the elements of
data mining...............................................................................................................................2
Question 3: Describing the three possible problems in data mining...............................3
Part B: Findings of the section..............................................................................................4
References:..............................................................................................................................9
Appendices:............................................................................................................................10
Document Page
BUSINESS INFORMATION SYSTEMS
2
Part A:
Question 1: Explaining data mining and describing each of the six phases of CRISP-
DM
Data mining is a practise, which is used by the organisation for examining the large
pre-existing database for generating new information. In addition, from the evaluation it can
be detected that data mining is mainly a tool, which is used for analysing the data and
determining the information, which is needed for analysis purpose. The six phases of
CRISP-DM are depicted as follows.
Business understanding: This phase relevantly evaluates and understands the project
objectives and requirements, which is needed in the data mining process (Larose and
Larose 2014).
Data understanding: This phase mainly familiarises with the data collection process by
identifying data quality issues in the initial stage by evaluating the obvious results.
Data preparation: The phase involves steps of recording and cleansing the data that is to
be used for the data mining process.
Modelling: The process relevantly utilises the overall data mining tools, which has been
used for conducting the data mining tool.
Evaluation: The stage relevant evaluates the overall result and determines, whether it
meets the business objective. The stage identifies different issues that needs to be
addressed for securely completing the evaluation (Wu et al. 2014).
Deployments: The stage relevantly puts the overall model in practise, which eventually help
in continuing the mining of the data.
Question 2: Explaining why data mining is important and describing the elements of
data mining
Data mining is important, as it allows the organisation to evaluate high end data and
determine the output, which is required for future use. There are three elements of data
mining, which are accuracy, relevancy, and specificity that allows the organisation to
evaluate the overall data. The elements of data mining are depicted as follows.
Document Page
BUSINESS INFORMATION SYSTEMS
3
Accuracy: The main elements of data mining is accuracy, as the technical tool requires high
quality and accurate data for comprising the analysis. Hence, accuracy of the data is
essential, as erroneous data will produce erroneous output, which will negative affect the
end results.
Relevancy: Relevancy of the data is also essential, as it allows the organisation for
determining the level of tool and programs, which can be used for harvesting the data. The
harvesting of data needs to be conducted for specific criteria, which helps in accumulating
data and using logical expressions for reigning the harvest parameters (Roiger 2017).
Specificity: The specificity of the data mining process is essential for the organisation, as it
allows the companies to evaluate the data on the basis of the needs. Therefore, the data
collection process needs to be specific, where no extra data needs to be collected, as it
might negatively affect the overall data mining process.
Question 3: Describing the three possible problems in data mining
Data mining has relevant problems, which reduces the functionality and quality of the
result obtained from the process. There are specific problems in data mining process, which
are identified as follows.
ï‚· The major problem that occurs from data mining are unavailability of data or difficulty to
access the data. The data mining needs adequate information, where its unavailability
might negatively affect the overall data mining process (Zheng 2015).
ï‚· The second problem is dealing with huge database, which will require distributed
approaches for ensuring the selection of adequate data for the mining process.
ï‚· The third problem that occurs with data mining process is the poor data quality, which
might negatively affect the overall process. The incorrect and inexact values in the data
mining process might negatively affect the overall results that will be obtained from the
function.
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
BUSINESS INFORMATION SYSTEMS
4
Part B: Findings of the section
No Publisher Author Weeks on list
1 Riverhead Paula Hawkins 102
2 Scribner Anthony Doerr 81
3 Vintage E L James 66
4 St. Martin's Kristin Hannah 63
5 Penguin Group Kathryn Stockett 58
6
Washington
Square Fredrik Backman 56
7 Andrews McMeel Rupi Kaur 52
8 Bantam George R R Martin 51
9 Berkley Liane Moriarty 38
10 Ballantine Lisa Wingate 35
The above table represents the ten most popular authors in terms of weeks on list,
which has been derived from the overall data. The author Paula Hawkins is considered to be
the most popular author in the list, as the book he published lasted for 102 weeks.
Row Labels Count of Book title Sum of Weeks on list2 Average of Weeks on list
Danielle Steel 32 38 1.1875
Susan Mallery 30 28 0.933333333
Christine Feehan 28 31 1.107142857
Stuart Woods 27 23 0.851851852
Debbie Macomber 26 23 0.884615385
Nora Roberts 25 57 2.28
Ron Carr 23 21 0.913043478
Maya Banks 22 24 1.090909091
Marie Force 20 14 0.7
Document Page
BUSINESS INFORMATION SYSTEMS
5
David Baldacci 20 121 6.05
Fern Michaels 19 13 0.684210526
Iris Johansen 17 16 0.941176471
Dean Koontz 17 14 0.823529412
Lee Child 16 56 3.5
Michael Connelly 16 44 2.75
J A Jance 16 16 1
Stephen King 15 73 4.866666667
Jill Shalvis 15 9 0.6
Lisa Scottoline 14 15 1.071428571
Sandra Brown 14 18 1.285714286
Karen Kingsbury 14 13 0.928571429
Kristen Ashley 14 11 0.785714286
John Sandford 14 37 2.642857143
Lynsay Sands 14 12 0.857142857
J R Ward 13 10 0.769230769
Janet Evanovich 13 29 2.230769231
Christina Lauren 13 8 0.615384615
J D Robb 13 14 1.076923077
Kresley Cole 12 9 0.75
Linda Lael Miller 12 12 1
James Patterson
and Maxine Paetro 12 20 1.666666667
Document Page
BUSINESS INFORMATION SYSTEMS
6
John Grisham 11 72 6.545454545
Sherrilyn Kenyon 11 7 0.636363636
Lisa Gardner 11 20 1.818181818
James Patterson 11 58 5.272727273
Douglas Preston
and Lincoln Child 11 13 1.181818182
Elin Hildebrand 11 11 1
The above list represents the overall data of authors who have more than 10 books
listed as the bestseller. In addition, the number total weeks that the books have been listed
are also depicted in the table with the average number of weeks that the books were
bestsellers (Torgo 2016).
44 CHARLES STREET
ACCIDENTAL HEROES
BETRAYAL
COUNTRY
FAIRYTALE
FIRST SIGHT
HAPPY BIRTHDAY
MAGIC
PEGASUS
PRECIOUS GIFTS
PROPERTY OF A NOBLEWOMAN
THE APARTMENT
THE CAST
THE MISTRESS
THE SINS OF THE MOTHER
UNTIL THE END OF TIME
0
1
2
3
4
5
6
Danielle Steel
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
BUSINESS INFORMATION SYSTEMS
7
A FOOL'S GOLD CHRISTMAS
ALL SUMMER LONG
BEST OF MY LOVE
DAUGHTERS OF THE BRIDE
HALFWAY THERE
JUST ONE KISS
MARRY ME AT CHRISTMAS
ONLY US
SECOND CHANCE GIRL
SISTERS LIKE US
SUMMER NIGHTS
THREE LITTLE WORDS
THRILL ME
UNTIL WE TOUCH
YOU SAY IT FIRST
0
0.5
1
1.5
2
2.5
3
3.5
Susan Mallery
AIR BOUND
CAT'S LAIR
DARK BLOOD
DARK GHOST
DARK LYCAN
DARK PROMISES
DARK WOLF
FIRE BOUND
LEOPARD'S BLOOD
LEOPARD'S PREY
SAMURAI GAME
SHADOW REAPER
SPIDER GAME
VIPER GAME
0
0.5
1
1.5
2
2.5
Christine Feehan
1105 YAKIMA STREET
16 LIGHTHOUSE ROAD
A TURN IN THE ROAD
ANGELS AT THE TABLE
BLOSSOM STREET BRIDES
FRIENDS ‰ÛÓ AND THEN SOME
IF NOT FOR YOU
LOST AND FOUND IN CEDAR COVE
MERRY AND BRIGHT
ROSE HARBOR IN BLOOM
STARRY NIGHT
SWEET TOMORROWS
THE WAY TO A MAN'S HEART
0
1
2
3
4
5
6
Debbie Macomber
Document Page
BUSINESS INFORMATION SYSTEMS
8
The above column graph directly indicates the level of books that has been published
by the top-level authors with maximum shelf life. Danielle Steel is one the biggest authors
with 32 books listed as the bestseller in the list. The above graph represents the overall
books, which have been listed in the above graph as per the number of weeks in the
bestseller list. In the similar instance, the above graph represents the data of Susan Mallery,
Christine Feehan, and Christine Feehan, which can eventually help in detecting the number
of books that have been published by each author. Danielle Steel is an author, who has
published the highest number of bestseller books, which has been derived from the overall
data (Aggarwal 2015).
Document Page
BUSINESS INFORMATION SYSTEMS
9
References:
Aggarwal, CC 2015, ‘Data mining: the textbook,’ Springer.
Chen, F, Deng, P, Wan, J, Zhang, D, Vasilakos, A.V & Rong, X 2015,’Data mining for the
internet of things: literature review and challenges’, International Journal of Distributed
Sensor Networks, vol. 11, no. 8, p.431-047.
Larose, DT and Larose, CD 2014, ‘Discovering knowledge in data: an introduction to data
mining’, John Wiley & Sons.
Roiger RJ 2017, ‘Data mining: a tutorial-based primer’, Chapman and Hall/CRC.
Torgo, L 2016, ‘Data mining with R: learning with case studies’, Chapman and Hall/CRC.
Witten, IH, Frank, E, Hall, MA & Pal, CJ 2016, ‘Data Mining: Practical machine learning tools
and techniques’, Morgan Kaufmann.
Wu, X, Zhu, X, Wu, GQ & Ding, W 2014, ‘Data mining with big data’, IEEE transactions on
knowledge and data engineering, vol. 26 no. 1, pp.97-107.
Zheng, Y 2015, ‘Trajectory data mining: an overview’, ACM Transactions on Intelligent
Systems and Technology (TIST), 6(3), p.29.
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
BUSINESS INFORMATION SYSTEMS
10
Appendices:
N
o Publisher Author
Primary
isbn10 date Book title
Weeks
on list
1 Riverhead
Paula
Hawkins
15946340
25 2/19/17
THE GIRL ON THE
TRAIN 102
2 Scribner
Anthony
Doerr
15011732
19
05-07-
2017
ALL THE LIGHT WE
CANNOT SEE 81
3 Vintage E L James
52543188
8
03-05-
2017
FIFTY SHADES
DARKER 66
4 St. Martin's
Kristin
Hannah
14668506
04
10/29/1
7 THE NIGHTINGALE 63
5
Penguin
Group
Kathryn
Stockett
14406976
63
04-08-
2012 THE HELP 58
6
Washington
Square
Fredrik
Backman
14767380
25 7/23/17 A MAN CALLED OVE 56
7
Andrews
McMeel Rupi Kaur
14494742
5X
03-11-
2018 MILK AND HONEY 52
8 Bantam
George R R
Martin
55389784
5 9/17/17
A GAME OF
THRONES 51
9 Berkley
Liane
Moriarty
42527486
1 5/21/17 BIG LITTLE LIES 38
1
0 Ballantine
Lisa
Wingate
42528469
7
06-09-
2018
BEFORE WE WERE
YOURS 35
Row Labels
Count of
Book title
Sum of
Weeks on
list2
Average of
Weeks on list
Danielle Steel 32 38 1.1875
Susan Mallery 30 28 0.933333333
Christine Feehan 28 31 1.107142857
Stuart Woods 27 23 0.851851852
Debbie Macomber 26 23 0.884615385
Nora Roberts 25 57 2.28
Ron Carr 23 21 0.913043478
Document Page
BUSINESS INFORMATION SYSTEMS
11
Maya Banks 22 24 1.090909091
Marie Force 20 14 0.7
David Baldacci 20 121 6.05
Fern Michaels 19 13 0.684210526
Iris Johansen 17 16 0.941176471
Dean Koontz 17 14 0.823529412
Lee Child 16 56 3.5
Michael Connelly 16 44 2.75
J A Jance 16 16 1
Stephen King 15 73 4.866666667
Jill Shalvis 15 9 0.6
Lisa Scottoline 14 15 1.071428571
Sandra Brown 14 18 1.285714286
Karen Kingsbury 14 13 0.928571429
Kristen Ashley 14 11 0.785714286
John Sandford 14 37 2.642857143
Lynsay Sands 14 12 0.857142857
J R Ward 13 10 0.769230769
Janet Evanovich 13 29 2.230769231
Christina Lauren 13 8 0.615384615
J D Robb 13 14 1.076923077
Kresley Cole 12 9 0.75
Linda Lael Miller 12 12 1
James Patterson and Maxine Paetro 12 20 1.666666667
John Grisham 11 72 6.545454545
Sherrilyn Kenyon 11 7 0.636363636
Lisa Gardner 11 20 1.818181818
James Patterson 11 58 5.272727273
Douglas Preston and Lincoln Child 11 13 1.181818182
Elin Hilderbrand 11 11 1
Kristin Hannah 10 85 8.5
Sherryl Woods 10 7 0.7
Philippa Gregory 10 8 0.8
Lisa Jackson 10 10 1
chevron_up_icon
1 out of 36
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]