Data Mining Project: Bestseller Book Dataset Analysis

Verified

Added on  2025/04/04

|17
|1619
|150
AI Summary
Desklib provides past papers and solved assignments for students. This project analyzes bestseller book data using data mining techniques.
Document Page
Business Information System
Page | 1
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Contents
Part A...............................................................................................................................................3
Part B...............................................................................................................................................8
Q1 Datasets......................................................................................................................................8
Q2: Data Dictionary.........................................................................................................................8
Q3. Sort the books in terms of number of weeks i.e. in descending order......................................9
Q4: Sort the date on the author and list them on the top...............................................................10
4.1 Sum of the total number of weeks for the authors having more than 10 books......................11
4.2 Average number of books........................................................................................................11
Q5..................................................................................................................................................12
References......................................................................................................................................15
Page | 2
Document Page
Part A
Question 1: Provide a definition of data mining and describe the six phases (see diagram
below) of the Cross-Industry Standard Process for Data Mining (CRISP-DM). In your own
words, explain what data mining is and describe each of the six phases of CRISP-DM
Data mining- Data mining is the process that discovers the large database patterns that
comprises numerous methods including the database systems, machine learning and statistics
(Chapman, et. al., 2000).
Cross-Industry Standard Process for Data Mining (CRISP-DM) is an industry that indulges
in providing the guidance towards the efforts related to the data mining (CRISP-DM., 2012).
The different phases of the Cross-Industry Standard Process for Data Mining (CRISP-DM) that
are mentioned under the below mentioned diagram are explained below:
Figure- Phases of the CRISP-DM reference model
Source- By Author, 2019
Business Understanding- This is the initial phase that concentrate on the requirements of the
project according to the perspective of the business after that this knowledge converting into the
Page | 3
Document Page
definition of the problem in the context of the data mining and for the achievement of the
objective a preliminary plan is designed.
Data Understanding- This phase of the data considerate commence with the collection of the
data and ensues with the accomplishments that make the data familiar, by identifying the
problems relating to the data and in regards to the concealed information detect interesting
subsets for the formation of the hypothesis.
Data preparation- For the construction of the final set of data this phase covers all the activities
from the raw data. This task needs to perform several times but not necessary in the set orders.
This phase includes attribute selection, table, record, and the modelling tools required for
cleaning.
Modeling- This phase includes the selection of modelling techniques to measure optimal values.
For the data mining problems, it consists of several practises. In some techniques, it is a special
requirement for data formation.
Evaluation- In this phase of the project, this phase is important for the achievement of the
objectives of the business. The main objective of this phase is to resolve the business issues and
in the end, the results of the data mining are achieved.
Deployment- In this phase, deployment is responsible for the implementation of the data mining
continually in the process across the enterprise. It is necessary to understand the actions require
to be taken by the customers to makes use of the generated models.
Page | 4
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Question 2: Explain why data mining is important and describe the three elements of data
mining in your own words. Provide a real world example of each element.
The role of the data mining is very crucial for the development of the business strategies. The
process of the data mining is very important in the process of decision making. This process
acknowledge about the way of data gathering, where the data comes and depicts the translation
of the information into the business (Cerami., 2018).The method of the Data mining is truly data
driven method that works with effective and efficient manner. The three important elements of
the data mining are mentioned underneath:
Figure- Elements of Data mining
Source- By Author, 2019
Accuracy- The process of the data mining assist that the data collected is accurate so that the
data is not collected erroneously and maintain the data quality so that no erroneous conclusions
are drawn and not impact on the results.
For example- The requirement of the retailer to collect the information related to the pricing by
comparing the prices from the numerous competitors. In case the trend of the pricing based on
the seasons or geography or the comparison made between the products are same or different.
Relevancy- In the context of the relevancy, it is important that the source from which data is
collected are relevant. For the acknowledgement of the context, the Machine learning and AL
Page | 5
Elements
of Data
mining
Accuracy
RelevancySpecificity
Document Page
technology assist to bridge the gap. The machine learning deals with the three types of the
context that includes industry, data and transfer.
For example- Data mining helps to keep the information that is relevant by combining with the
technology of ML or AL while crafting MySQL.
Specificity- The data mining assist to the mine the information that is specific and reduce to
harvest the in the scrape data. It ensures that the vital information are mined that is useful for the
achievement of the goals, customers and the industry.
For example- The data mining use to find the valuable data that is useful for the company goals,
customers fairly and quickly.
Page | 6
Document Page
Question 3: Using the last articles on analytics and common problems in data mining, write
a paragraph in your own words on three possible problems/challenges in data mining.
The three common problems in data mining are mentioned under:
In data mining the access of the relevant data in the structured format becomes difficult or the
unavailability of the data (Managing the Analytics Life Cycle for Decisions at Scale. 2018).
Quality of the data may be poor like dirty data, noisy data in correct values; size of the data is
inadequate or missing values and there is lack of transparency level (Data Mining., 2015).
In data mining dealing with the non-static, cost-sensitive and unbalanced data is a challenge.
Page | 7
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Part B
Q1 Datasets
Figure 1: Datasets
The figure which is shown above represents the data sets for the books of the bestseller authors
during the time-period 2011-2018 and it contains several variables.
Q2: Data Dictionary
From the datasets of the books, there are some variables and each variable have the datatype and
according to the data type, the entries are filled in the variables.
S.no. Variable Datatype
1 Publisher Text
Page | 8
Document Page
2 Primary isbn10 Number
3 Title of the book Text
4 Author Name Text
5 Weeks on list Number
6 Date Number
Q3. Sort the books in terms of number of weeks i.e. in descending order
Figure 2: Sorting of authors
In this task, we are sorting all the books according to the number of weeks and sort them in the
descending order. They are sorted on the basis of the list of the best seller and the books which
have the highest number of the weeks remains on the top list and so on (Jackson, 2016).
Page | 9
Document Page
From the given datasets of the books, in order to sort the books, we have created the table and
with the help of SORT function, these books are sorted. After sorting the books, the top 10
authors found which are most popular are:
Figure 3: 10 most popular authors
Page | 10
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Q4: Sort the date on the author and list them on the top.
Figure 4: Authors ‘name in ascending order
The figure which is shown above represents the name of the authors in the ascending order
starting with A.
Figure 5: Authors with more than 10 books
The figure which is shown above represent the number of authors which have more than ten
books and it is calculated after sorting the authors with their name starting from A to Z. Then the
filter is set on the number of week list and apply the filter ‘Values equals or more than’ and set
the value to 10.
Page | 11
Document Page
Page | 12
chevron_up_icon
1 out of 17
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]