B9DA103 Data Mining: Critique of CRISP-DM Model for Big Data Mining

Verified

Added on 2022/08/17

AI Summary

This report provides a comprehensive critique of the CRISP-DM (Cross-Industry Standard Process for Data Mining) model, evaluating its applicability and effectiveness in the context of Big Data mining. The report begins with an introduction to the CRISP-DM model, outlining its six stages and highlighting its significance in guiding data mining projects. The critique delves into several limitations of the model, including a lack of detailed business understanding, issues related to data preparation and modeling, and challenges in deployment and iteration. The analysis is supported by a review of related journal articles published after 2012, which further explores the model's strengths and weaknesses. The report then shifts to a critical analysis of a 'Big Data' mining problem domain, proposing appropriate data mining tools and techniques to meet an organization's needs for business intelligence, highlighting the benefits to the business along with measurable implementation success criteria. The report concludes by summarizing the key findings and emphasizing the importance of addressing the identified issues to ensure successful data mining projects. The student's assignment offers valuable insights into data mining best practices and challenges.

Running head: B9DA103 DATA MINING
B9DA103 Data Mining
Name of the Student
Name of the University
Authors note
We have no known conflict of interest to disclose.
The Following paper is critique paper on the CRISP-DM Model: The New Blueprint for Data
Mining”.

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

1B9DA103 DATA MINING
Introduction
The CRISP-DM model or the Cross Industry Standard Process for Data Mining is a
helpful framework that helps the data analysts in design, develop, build, test and finally and
deployment of machine learning solutions for the problems in different industries and domains.
This framework takes the business problems into account and defines the required data
mining tasks that are independent of both required technology as well as application area which
makes the data mining process easier for the different small and medium sized organizations.
Through the above mentioned six stages it helps in determining the road map in order to plan and
carry out some specific data mining project. Following are the details about the different stages.
Business understanding is the first stage in which the main focus is on having a clear
understanding about the objectives of the proposed project by analyzing the issues from the
business perspective. After analyzing the situation, the issues are converted in some specific
data mining problems. After this a preliminary plan is considered in order to achieve specific
business objectives.
In case of determining the process, technique of data mining as well as the data that
needs to be analyzed later, it is very crucial for the data analysts to completely understand the
business problem for which they will find the solution through the analysis and mining the data.
lack of clarity in understanding the details
While using the CRISP-DM model the instead of drilling down in the details of the
business problem to get enough clarity on the business problem as well as how any proposed
analytic solution might help in mitigation of the issue, the appointed analytics project team
determines some business goals, metrics in order to measure the success of the data
analytics/mining the project. In this way the analytics team is able to understand the primary
business objective through the project while minimizing the overhead and go to analyzing the
stored data. Most of the time in this way the projects failed to meet the primary objective
through the development of the interesting models which may not even meet the actual business
need of the organization in a specific industry.

2B9DA103 DATA MINING
Furthermore Data is generated from diverse sources and ensuring this is availability and
inclusion of the data in the analytics process is repeatable process in the deployed code while
using the CRISP-DM methodology for solving a business issue.
While using the CRISP-DM methodology the selection of appropriate level of precision
for the targeted problem is important. It helps in ensuring that the data scientists in the
project team will be delivering results or models that are required while not wasting too much
time in the preparation or modeling of the algorithms in order to improve accuracy which
cannot be used in a timely manner by the organization to exploit some specific opportunities in
the business area.
The Modeling phase includes the activities like the selection of modeling technique,
generation of test methodology, development of the models and finally assessment of the
performance of the developed models. While modeling the as there are numerous modeling
techniques can be selected and applied according to the selected parameters to find out the
optimal values. Naturally, several techniques exist and can be used for similar data mining
problem for a scenario. Some of the techniques asks for specific requirements in the form of
input data. Thus, at this point, stepping back to data preparation stage is necessary to get meet the
requirements of the model. Here it can be stated that, repeatedly getting back to the preparation
phase while selecting the modeling technique is time consuming as well as cost increasing step.
As the different data analytics platforms are becoming simplified day by day and
allowing the data scientists to develop and deploy sophisticated machine learning or analysis
models the important stages or decisions related to the treatment of the missing data in the
selected dataset, or creation of the new synthetic features are lost [3]. This may lead to loss of
quality in the developed model or solution for the business issue.
Iterative and unnecessary rework
In solving the business issues, often the analytic teams assess the undertaken project and
its results in different analytic terms. Such as if the project is about prediction model then the
predictive model must be good in the predicting the desired aspect. It is important to realize that

3B9DA103 DATA MINING
accuracy or the goodness of a model is not necessarily important to check the analytic results
against different business objectives. Measuring the success of the data analysis project is very
difficult without having clarity on business problem faced by the organization. In case the
developed analytic solution does not seem to meet with the business objectives for the project
the most of the time they try to find new modeling techniques which is not right as mentioned
in the journal instead of working with the business organization and other stakeholders in order
to re-evaluate business problem to be solved.
Deployment issues
In this method the main importance is on the development of the model by the analytic
teams and the team most of the time do not think about deployment issues or and
operationalization of their developed models. It is important to recognize that the developed
models need to be applied over the live data at the different data stores or may be embedded in
any already operational systems. It is important to keep in mind that while developing the
model they needs to be easy to implement/deploy on live data sources to be usable by the
organization and if the model is hard or impossible to deploy then it would not be really
usable once deployed for the problems it was intended to at the first place. In this way the whole
process increases the required time and cost for deployment of a model and leads to the large
amount of models that are developed but never have a business impact in the organizations.
Failure to iterate in the future with the need
With the varying data and needs of the organization the developed models also get old
and less-effective for the organizations and their needs/objectives. Therefore, it is important to
keep the developed models updated if they need to be valuable for the organizations. The
analysts who develop models for solving any business issue they know that circumstances for the
business organizations can change completely which may lead to the undermining the value of a
developed and deployed model overtime. Along with that different data trends that were driving
forces an specific model can also change with time. thus it is important to have enough clarity
on the business problem in order to determine or estimate the changes in the future in order to
track and simulate the developed model’s business performance. As the analysts does not invest
or take initiative for a revised model compared to the initial model. In this way the models

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

4B9DA103 DATA MINING
develop with initial requirement and driving forces are left unmonitored and unmaintained and
thus undermines the long-term value of analytics model required for the organization from a
specific industry.
Conclusion
By analyzing the given methodologies provided in the journal it can be stated that the
mentioned issues increase the likelihood of development of an impressive analytic
solution/model for the organizations that may not add business value at the end of the project.
The business organizations of different industries which needs to exploit business opportunities
through the analytics measures such as data mining/predictive analytics/machine learning then
they cannot afford this kind of issues in their data analytics project that are intended to solve any
problem.

5B9DA103 DATA MINING
Bibliography
[1]I. Wowczko, "A Case Study of Evaluating Job Readiness with Data Mining Tools and CRISP-DM
Methodology", International Journal for Infonomics, vol. 8, no. 3, pp. 1066-1070, 2015. Available:
10.20533/iji.1742.4712.2015.0126.
[2]H. Wiemer, L. Drowatzky and S. Ihlenfeldt, "Data Mining Methodology for Engineering Applications
(DMME)—A Holistic Extension to the CRISP-DM Model", Applied Sciences, vol. 9, no. 12, p. 2407, 2019.
Available: 10.3390/app9122407.
[3]S. Murpratiwi, A. Narendra and M. Sudarma, "Mapping Patterns Achievement Based on CRISP-DM
and Self Organizing Maps (SOM) Methods", International Journal of Engineering and Emerging Technology, vol.
2, no. 1, p. 1, 2017. Available: 10.24843/ijeet.2017.v02.i01.p01.
[4]"Data Cleaning in Knowledge Discovery Database-Data Mining (KDD-DM)", International Journal of
Engineering and Advanced Technology, vol. 8, no. 63, pp. 2196-2199, 2019. Available:
10.35940/ijeat.f1100.0986s319.