ANL203 Analytics for Decision Making: Data Mining Project Report

Verified

Added on  2023/04/26

|8
|1471
|250
Homework Assignment
AI Summary
This assignment solution provides a detailed analysis of a data mining project using the CRISP-DM framework. It covers project understanding, data understanding, data preparation, modeling, evaluation, and deployment phases. The solution also identifies variables to improve predictive models, discusses limitations of the data, calculates hit rates, and outlines prerequisites and limitations of data mining projects. Furthermore, it explores association analysis and its application in increasing company revenue. The document is contributed by a student and available on Desklib, a platform offering study tools for students.
tabler-icon-diamond-filled.svg

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
Data mining
Students name:
Course:
Professor’s name:
Date:
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
P a g e | 1
Data mining
a) Planning for a project by applying the CRISP-DM framework.
Planning for a project by applying the CRISP-DM framework involves 6 phases as
described below;
Phase 1: Project understanding.
The first phase involves understanding the nature and type of the project that is to be
executed. The project objectives should be determined. Key persons and their roles, and
the project units should be clearly outlined.
The situation of the project should be assessed on whether data mining is in use for the
project. The goals of the data mining should then be determined, then a project plan
should be produced.
Phase 2: Understanding the data involved
Properties of the important attributes of the data should be explored in detail. Data quality
also ought to be verified by determining whether the data covers all the cases required.
Values should also be checked for plausibility.
Phase 3: Data preparation
Before analyzing the project data, they ought to be prepared. The steps of data
preparation involve;
Data selection: Decision should be made on which data should be used. An explanation
should be made on why certain data was selected or excluded.
Data cleaning: Noise should be corrected and removed from the data. A decision should
be made on how to deal with special values and their interpretations.
Data integration: Sources of data should be integrated and results stored.
Document Page
P a g e | 2
Data mining
Data formatting: Data should be reordered, rearranged and reformatted according to the
project desirability.
Phase 4: Modelling
This phase involves selecting a modelling technique based on the data mining objectives.
A test design should be generated. Specifications have to be made on how to divide the
dataset into training, test and validation sets.
The model should be built by running the selected technique on the input variables. The
results of the model should be assessed and interpreted.
Phase 5: Evaluation
Results of the data mining model should be evaluated to check the impact of the results
on the data mining goals.
A review should be made on the data mining process. Based on this review, the next step
of data mining process should be considered.
Phase 6: Deployment
A determination should be made on how the results of the process ought to be utilized.
The people who need the results and how often they will need to use the results should
also be determined.
The results should then be presented to the users.
b) One variable that can help improve the predictive model.
One variable not indicated in Table 1 that would help improve the predictive model is the
frequency of telephone call to the customers. This is an important variable that outlines
the number of times that the marketing team called the customers regarding the insurance
cover. It would be expected that customers who are contacted more frequently
Document Page
P a g e | 3
Data mining
concerning the insurance cover would have more information and would understand the
cover more intently. They are therefore expected to adopt the cover.
However, customers called less frequently concerning the cover would have limited
information. They might therefore not adopt the insurance cover at a fast rate as
compared to those customers who are more frequently called.
(Stephen, 2017) argues that a product that is highly promoted will attract high sales
compared to products that are less promoted.
c) One possible limitation of the data that can affect model performance in deployment
One possible limitation of the data is the way in which the promotion was done.
Promotion is an important aspect of sales as it makes information available to target
customers. Proper promotion channels should be used in order to attract more customers.
However, poor choice of a promotion channel leads to a limited number of interested
customers. The company preferred making the promotion via telephone calls as opposed
to outreach promotions. The method of promotion might have been preferred because
telephone calls cut down on transportation costs, and it is possible to reach out to more
people via this method. However, a challenge arises at the results deployment stage.
It is nearly impossible to deploy results or administer the insurance cover to the
customers via telephone calls. Therefore, it is a limitation making promotions through
telephone calls.
d) Hit rates
no yes Total
no 209 22 231
yes 75 93 168
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
P a g e | 4
Data mining
Total 284 115 399
We calculate the hit rates as follows;
Hit rate for yes=115
399 =0.2882
Hit rate for no=284
399 =0.7118
Yes, refer to the number of successful sales while no refer to unsuccessful sales.
The hit rate for yes is 0.2882. this implies that there is a 0.2882 or 28.82% chance of
making a sale. There is therefore a low efficiency in the sales personnel.
e) Pre-requisites and limitations of data mining project.
Pre-requisites.
The pre-requisites that may contribute to the success of the data mining project described
in (a) above are as follows;
i) Project requirements: These include decisions that are necessary for the data
analytics project.
ii) Data: Raw collected data that supports the data analytics process is necessary.
iii) Tools: All tools needed for the collection, transformation, analysis and modelling
of the data are key to the success of the data mining project.
iv) Requirements for deployment. The methods and infrastructure used for deploying
the results of the process are necessary in determining the success of the data
mining project.
Limitations
Document Page
P a g e | 5
Data mining
i) Violation of privacy: Data mining violates the privacy of the people form which
data is collected. Therefore, safety and security of the data mining users is not
guaranteed. This could thereby result to miscommunication between people,
hindering the success of the data mining project.
ii) Irrelevant information: There exists a possibility of collecting too much
information. Not all of these data is relevant for the data mining process. The use
of irrelevant information could lead to undesirable results and this could therefore
hamper the success of the data mining project.
iii) Misuse of information: Considering the fact that security and safety of the users
are minimal, there arises a possibility of misuse of information in the data mining
process. Information could be easily misused to harm others. This therefore,
undermines the success of the data mining project.
f) Association analysis
Association analysis explores relationships among data variables (Sun, 2014).
Objective of data mining
The main aim of data analysis is to extract meaning from unstructured data. Data mining
tries to explore patterns, trends and association among data variables. The company could
better improve its revenue by exploring association between factors that affect revenue
and the revenue generated.
Data to be used
All data possessed by the company shall be used for association analysis. These data
could either be directly or indirectly possessed by the company.
Application of data mining to help increase company revenue
Document Page
P a g e | 6
Data mining
Data mining is used by the company to decrease costs, thereby increasing its revenue
margins. Data mining is also used in identification of customers. This increases the
market base of the company, and thereby increases the company’s revenue.
Data mining also sees to it that there is a good customer service experience. Customers
therefore have the urge to come back for more of the company products, thus increasing
the company revenue.
With data mining, a competitive intelligence can be done. Ways in which the company
can perform better than its competitors are deliberated on. Identification and
implementation of such factors improves the company’s revenue.
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
P a g e | 7
Data mining
References
Stephen, M. (2017). Simultaneous use of customer, product and inventory information in dynamic
product promotion. International Journal of Production Research, 4.
Sun, W. (2014). A Likelihood-Based Framework for Association Analysis of Allele-Specific Copy Numbers.
Journal of the American Statistical Association, 3.
chevron_up_icon
1 out of 8
circle_padding
hide_on_mobile
zoom_out_icon
logo.png

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]