Comparing Data Science Project Management Methodologies - MITS5502
VerifiedAdded on 2022/12/19
|8
|2170
|32
Report
AI Summary
This report presents a comparative analysis of four data science project management methodologies: Agile Scrum, Agile Kanban, CRIP-DM, and a baseline approach. The study employs a controlled experiment where teams utilize each methodology to complete the same project, allowing for a direct comparison of their effectiveness. The report details the methodologies, including their principles and processes, and outlines the experimental setup, including team composition and task assignments. The findings reveal significant differences in outcomes, with Agile Kanban emerging as the most effective methodology, while Agile Scrum proved to be the least effective. The report concludes with a discussion of the factors influencing these outcomes, such as team dynamics and member backgrounds, and suggests that the choice of methodology has a substantial impact on data science project success.

Assignment Title: Comparing Data
Science Project Management
Methodologies
Name:
Student Id:
Student Name: Student Id: 1
Science Project Management
Methodologies
Name:
Student Id:
Student Name: Student Id: 1
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Contents
Introduction...........................................................................................................................................3
Data science process..............................................................................................................................3
A need for an improved methodology...................................................................................................3
Methodology.........................................................................................................................................4
1. Scrum Agile Method..................................................................................................................4
2. Agile Kanban.............................................................................................................................4
3. CRIP-DM..................................................................................................................................5
4. Baseline.....................................................................................................................................6
Findings.................................................................................................................................................6
Conclusion.............................................................................................................................................6
References.............................................................................................................................................7
Student Name: Student Id: 2
Introduction...........................................................................................................................................3
Data science process..............................................................................................................................3
A need for an improved methodology...................................................................................................3
Methodology.........................................................................................................................................4
1. Scrum Agile Method..................................................................................................................4
2. Agile Kanban.............................................................................................................................4
3. CRIP-DM..................................................................................................................................5
4. Baseline.....................................................................................................................................6
Findings.................................................................................................................................................6
Conclusion.............................................................................................................................................6
References.............................................................................................................................................7
Student Name: Student Id: 2

Introduction
Data science is an upcoming field that combines many domains including data management,
software development and statistics. [9] Due to increasing data collection abilities,
availability of storage of data and advancing data analysis technology, the field of data
science is growing rapidly. There has raised a need to research on the best methodology that
will enable effective communication and coordination in a data science projects. Due to this
demand, I choose to do a research on various methodologies that can be applied in data
science. Application of a good method will benefit the data science team in a great way.
Some of these benefits are selection of good data architecture, identifying good analytical
techniques and selection of the member to be included in the team.
This research paper I intend to discuss four methodologies that can be applied in the data
science project. These methodologies are Agile Scrum, Agile Kanban, CRIP-DM and a
baseline. I will also compare these methodologies by use of a controlled experiment and
know which methodology is better. Lastly I will conclude on which methodology is best for
data science projects.
Data science process
Data science process is the process of acquiring, extraction of data, data combination, data
modelling, data analysis, report interpretation and drawing inferences. [1] Earlier before the
software development process was thought as an individual process. A classic phased
development was defined and the each phase was had a series of tasks. Due to the rapid
growth of technology there was a need to come up with a methodology that will allow a team
to coordinate in development process.
A need for an improved methodology
Those who work in the data science solve problem through analysis of data and answer
question by interpreting the reports. There is a need of an improved methodology that will
help the group to focus on task coordination, process and the technology. Due to lack of an
improved methodology a research shows that 55% of data science projects do not complete
and some fall out of objectives. As there is many reasons as to why data science projects are
Student Name: Student Id: 3
Data science is an upcoming field that combines many domains including data management,
software development and statistics. [9] Due to increasing data collection abilities,
availability of storage of data and advancing data analysis technology, the field of data
science is growing rapidly. There has raised a need to research on the best methodology that
will enable effective communication and coordination in a data science projects. Due to this
demand, I choose to do a research on various methodologies that can be applied in data
science. Application of a good method will benefit the data science team in a great way.
Some of these benefits are selection of good data architecture, identifying good analytical
techniques and selection of the member to be included in the team.
This research paper I intend to discuss four methodologies that can be applied in the data
science project. These methodologies are Agile Scrum, Agile Kanban, CRIP-DM and a
baseline. I will also compare these methodologies by use of a controlled experiment and
know which methodology is better. Lastly I will conclude on which methodology is best for
data science projects.
Data science process
Data science process is the process of acquiring, extraction of data, data combination, data
modelling, data analysis, report interpretation and drawing inferences. [1] Earlier before the
software development process was thought as an individual process. A classic phased
development was defined and the each phase was had a series of tasks. Due to the rapid
growth of technology there was a need to come up with a methodology that will allow a team
to coordinate in development process.
A need for an improved methodology
Those who work in the data science solve problem through analysis of data and answer
question by interpreting the reports. There is a need of an improved methodology that will
help the group to focus on task coordination, process and the technology. Due to lack of an
improved methodology a research shows that 55% of data science projects do not complete
and some fall out of objectives. As there is many reasons as to why data science projects are
Student Name: Student Id: 3
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

not completed, a well-defined methodology will help the team to identify these reasons in
prior. A well-defined methodology is believed to improve the quality of the results.
Methodology
In order to compare different methodologies, we need to conduct a controlled experiment. A
controlled experiment is an experiment where by all input factor are kept constant. The
project is to be done in groups of certain number of members. These groups are given the
same project description. The groups are to conduct this experiment using different
methodologies. In our case we will be comparing four methodologies and hence we need to
groups. Each group is to conduct this experiment using unique methodology. These groups
need to have good understanding of these methodologies. [10] Before the start of the
experiment, team effectiveness is measured to ensure that every team has same effectiveness.
The methods that can be used in the controlled experiment are discussed below;
1. Scrum Agile Method
In scrum method, more detailed description of hoe the project is to be done is not given to the
team of developers. These details are left for the developers who focus more on the goal of
the project. The team member usually conducts spring meeting to discuss the desired [6]
outcome of the project instead of a set of task definition, entry criteria, and validation criteria.
In scrum methodology there is no overall team leader who decide on what to be done and by
who. Decisions are made as a team which enables them make good decisions. There are two
roles in the scrum method; one is the ScrumMaster who work as a coach to help the team to
execute the various tasks in the method. Secondly, the client or the product owner who
represents the user, customer or business who guides the development team through the
development process. [6]
The team member meet at the start of the experiment to discuss how many times they are to
meet and the tasks they will perform each time they meet. The team divides the task into
small features to code and test the functionality of each feature. Every team member is
expected to attend the daily scrum meeting including the coach and the client.
2. Agile Kanban
Agile kanban is a type of agile methodology that combine phases to complete data science
integrated by pipeline process management. [7, 10] Kanban is applied in many ares including
software development. Kanban method is a methodology that is designed to help the team to
work effectively. Kanban methodology is based on three principles;
Student Name: Student Id: 4
prior. A well-defined methodology is believed to improve the quality of the results.
Methodology
In order to compare different methodologies, we need to conduct a controlled experiment. A
controlled experiment is an experiment where by all input factor are kept constant. The
project is to be done in groups of certain number of members. These groups are given the
same project description. The groups are to conduct this experiment using different
methodologies. In our case we will be comparing four methodologies and hence we need to
groups. Each group is to conduct this experiment using unique methodology. These groups
need to have good understanding of these methodologies. [10] Before the start of the
experiment, team effectiveness is measured to ensure that every team has same effectiveness.
The methods that can be used in the controlled experiment are discussed below;
1. Scrum Agile Method
In scrum method, more detailed description of hoe the project is to be done is not given to the
team of developers. These details are left for the developers who focus more on the goal of
the project. The team member usually conducts spring meeting to discuss the desired [6]
outcome of the project instead of a set of task definition, entry criteria, and validation criteria.
In scrum methodology there is no overall team leader who decide on what to be done and by
who. Decisions are made as a team which enables them make good decisions. There are two
roles in the scrum method; one is the ScrumMaster who work as a coach to help the team to
execute the various tasks in the method. Secondly, the client or the product owner who
represents the user, customer or business who guides the development team through the
development process. [6]
The team member meet at the start of the experiment to discuss how many times they are to
meet and the tasks they will perform each time they meet. The team divides the task into
small features to code and test the functionality of each feature. Every team member is
expected to attend the daily scrum meeting including the coach and the client.
2. Agile Kanban
Agile kanban is a type of agile methodology that combine phases to complete data science
integrated by pipeline process management. [7, 10] Kanban is applied in many ares including
software development. Kanban method is a methodology that is designed to help the team to
work effectively. Kanban methodology is based on three principles;
Student Name: Student Id: 4
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

It enhances the flow of development. The next highest item is pulled from the backlog into
development. Limiting the work in progress helps to balance the amount of work that a team
is committing. The items are visualized which is very informative. [10]
Comparison of kanban and Scrum methodology
Kanban Scrum
No rules prescribe There are prescribing rules of scrum master
and that of product owner.
Delivery is continuous Delivery is timely in sprints
There one item is pulled at a time which
enhances the smooth work flow.
A batch of work is pulled in the system.
It is easy to change work at any time There is no change that can be made in the
middle of the sprint
3. CRIP-DM
This is a methodology that follows a well-defined phase in order to achieve the goal of data
science project. [3] These phases include understanding of business objectives, data
understanding, data preparation, data modelling evaluation and deployment of the model. In
this method, the team can navigate through these phases progressively as well as go back to
previous stages. [4]
i. Understanding the business objectives
This is the stage where the business roles are identified and the project description is
analysed. Lack of good evaluation of these project descriptions can affect the entire results of
the project.
ii. Data understanding
In this phase the kind of data that is collected has to be understood. A list of all data sources
and their location is prepared; the possible challenges of acquiring these data are also
identified. A record of these challenges is kept which will help other teams that they might
need to carry out the same kind of data science project.
iii. Data preparation
Student Name: Student Id: 5
development. Limiting the work in progress helps to balance the amount of work that a team
is committing. The items are visualized which is very informative. [10]
Comparison of kanban and Scrum methodology
Kanban Scrum
No rules prescribe There are prescribing rules of scrum master
and that of product owner.
Delivery is continuous Delivery is timely in sprints
There one item is pulled at a time which
enhances the smooth work flow.
A batch of work is pulled in the system.
It is easy to change work at any time There is no change that can be made in the
middle of the sprint
3. CRIP-DM
This is a methodology that follows a well-defined phase in order to achieve the goal of data
science project. [3] These phases include understanding of business objectives, data
understanding, data preparation, data modelling evaluation and deployment of the model. In
this method, the team can navigate through these phases progressively as well as go back to
previous stages. [4]
i. Understanding the business objectives
This is the stage where the business roles are identified and the project description is
analysed. Lack of good evaluation of these project descriptions can affect the entire results of
the project.
ii. Data understanding
In this phase the kind of data that is collected has to be understood. A list of all data sources
and their location is prepared; the possible challenges of acquiring these data are also
identified. A record of these challenges is kept which will help other teams that they might
need to carry out the same kind of data science project.
iii. Data preparation
Student Name: Student Id: 5

In this stage data collection and data cleaning is done. Data cleaning is the activity of raising
the standard of data that is selected to a certain level required by the analysis technique. A
report is written to address how you handle the problem of data quality. It is called a data
cleaning report.
iv. Data modelling
In this phase, the modelling technique is selected. Before the team actually build the model, a
test design is required. These test design will be used to test the actual model if it meets the
expectations. Then the actual model is developed by use of the chosen modelling tool. [11]
v. Evaluation
In this phase the model is evaluated to see if it meets the objectives of the business. The
model is tested using the actual data and if it gives the expected results the model is good.
If the model does not meet the expectation of the client, opinions are given on what is the
next step.
vi. Deployment
This is the final phase of this methodology. This is the phase where the model is put into
actual use. A deployment strategy is prepared which gives a detailed report on how to deploy
the model.
4. Baseline
In order to know if a method to follow is of any importance in data science project, one group
is not given any method to follow. This group execute their tasks as they are pleased.
Experiment findings
After all the groups has finished, the next step is to present their findings. Results are
observed and compared against each other. A conclusion is made in consideration of the
quality of the results. A good methodology should give a high quality results that meets the
goal of the data science project.
Conclusion
In conclusion, a research has been done that shows that agile kanban is a better methodology
compared with scrum and the CRIP-DM. The satisfaction of the team members to work
Student Name: Student Id: 6
the standard of data that is selected to a certain level required by the analysis technique. A
report is written to address how you handle the problem of data quality. It is called a data
cleaning report.
iv. Data modelling
In this phase, the modelling technique is selected. Before the team actually build the model, a
test design is required. These test design will be used to test the actual model if it meets the
expectations. Then the actual model is developed by use of the chosen modelling tool. [11]
v. Evaluation
In this phase the model is evaluated to see if it meets the objectives of the business. The
model is tested using the actual data and if it gives the expected results the model is good.
If the model does not meet the expectation of the client, opinions are given on what is the
next step.
vi. Deployment
This is the final phase of this methodology. This is the phase where the model is put into
actual use. A deployment strategy is prepared which gives a detailed report on how to deploy
the model.
4. Baseline
In order to know if a method to follow is of any importance in data science project, one group
is not given any method to follow. This group execute their tasks as they are pleased.
Experiment findings
After all the groups has finished, the next step is to present their findings. Results are
observed and compared against each other. A conclusion is made in consideration of the
quality of the results. A good methodology should give a high quality results that meets the
goal of the data science project.
Conclusion
In conclusion, a research has been done that shows that agile kanban is a better methodology
compared with scrum and the CRIP-DM. The satisfaction of the team members to work
Student Name: Student Id: 6
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

together and the willingness of the members are also the key factor to consider when
comparing different data science projects methodologies. Also comparison of these
methodologies is affected greatly by background of the team members. It is difficult to get
team members of the same background.
References
[1] C. Shearer, "The CRISP-DM model: The new blueprint for data mining," Journal of Data
Warehousing, vol. 5, no. 4, pp. 13-22, 2000.
[2] G. Mariscal, O. Marban, and C. Fernandez, "A survey of data mining and knowledge
discovery process models and methodologies," The Knowledge Engineering Review, vol. 25,
no. 02, pp. 137-166, 2010.
[3] G. Piatetsky, "CRISP-DM, still the top methodology for analytics, data mining, or data
science projects," October 28, 2014. Available: http://www.kdnuggets.com/2014/10/crisp
dm-topmethodology-analytics-data-mining-data-scienceprojects.html
[4] Huber, Steffen, et al. "DMME: Data mining methodology for engineering applications–a
holistic extension to the CRISP-DM model." Procedia CIRP 79 , 2019, pp 403-408.
[5] Levy, Richard, Michael Short, and Peter Measey. "Agile foundations: principles, practices
and frameworks." BCS, 2015.
[6] López-Martínez, Janeth, et al. "Problems in the adoption of agile-scrum methodologies: A
systematic literature review." 2016 4th International Conference in Software Engineering
Research and Innovation (CONISOFT). IEEE, 2016.
[7] M. O. Ahmad, J. Markkula, and M. Oivo, "Kanban in software development: A
systematic literature review," in Software Engineering and Advanced Applications (SEAA),
2013 39th EUROMICRO Conference on, 2013, pp. 9-16: IEEE.
[8] N. W. Grady, M. Underwood, A. Roy, and W. L. Chang, "Big Data: Challenges, practices
and technologies: NIST Big Data Public Working Group workshop at IEEE Big Data 2014,"
in Big Data (Big Data), 2014 IEEE International Conference on, 2014, pp. 11-15: IEEE.
[9] Van Der Aalst, Wil. "Data science in action." Process Mining. Springer, Berlin,
Heidelberg, pp 3-23, 2016.
Student Name: Student Id: 7
comparing different data science projects methodologies. Also comparison of these
methodologies is affected greatly by background of the team members. It is difficult to get
team members of the same background.
References
[1] C. Shearer, "The CRISP-DM model: The new blueprint for data mining," Journal of Data
Warehousing, vol. 5, no. 4, pp. 13-22, 2000.
[2] G. Mariscal, O. Marban, and C. Fernandez, "A survey of data mining and knowledge
discovery process models and methodologies," The Knowledge Engineering Review, vol. 25,
no. 02, pp. 137-166, 2010.
[3] G. Piatetsky, "CRISP-DM, still the top methodology for analytics, data mining, or data
science projects," October 28, 2014. Available: http://www.kdnuggets.com/2014/10/crisp
dm-topmethodology-analytics-data-mining-data-scienceprojects.html
[4] Huber, Steffen, et al. "DMME: Data mining methodology for engineering applications–a
holistic extension to the CRISP-DM model." Procedia CIRP 79 , 2019, pp 403-408.
[5] Levy, Richard, Michael Short, and Peter Measey. "Agile foundations: principles, practices
and frameworks." BCS, 2015.
[6] López-Martínez, Janeth, et al. "Problems in the adoption of agile-scrum methodologies: A
systematic literature review." 2016 4th International Conference in Software Engineering
Research and Innovation (CONISOFT). IEEE, 2016.
[7] M. O. Ahmad, J. Markkula, and M. Oivo, "Kanban in software development: A
systematic literature review," in Software Engineering and Advanced Applications (SEAA),
2013 39th EUROMICRO Conference on, 2013, pp. 9-16: IEEE.
[8] N. W. Grady, M. Underwood, A. Roy, and W. L. Chang, "Big Data: Challenges, practices
and technologies: NIST Big Data Public Working Group workshop at IEEE Big Data 2014,"
in Big Data (Big Data), 2014 IEEE International Conference on, 2014, pp. 11-15: IEEE.
[9] Van Der Aalst, Wil. "Data science in action." Process Mining. Springer, Berlin,
Heidelberg, pp 3-23, 2016.
Student Name: Student Id: 7
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

[10] Way, Gregory P., et al. "A comparison of methodologies to test aggression in
zebrafish." Zebrafish 12.2 2015, pp 144-151.
[11] Z. Soh, Z. Sharafi, B. Van den Plas, G. C. Porras, Y.-G. Guéhéneuc, and G. Antoniol,
"Professional status and expertise for UML class diagram comprehension: An empirical
study," in Program Comprehension (ICPC), 2012 IEEE 20th International Conference on,
2012, pp. 163-172: IEEE.
Student Name: Student Id: 8
zebrafish." Zebrafish 12.2 2015, pp 144-151.
[11] Z. Soh, Z. Sharafi, B. Van den Plas, G. C. Porras, Y.-G. Guéhéneuc, and G. Antoniol,
"Professional status and expertise for UML class diagram comprehension: An empirical
study," in Program Comprehension (ICPC), 2012 IEEE 20th International Conference on,
2012, pp. 163-172: IEEE.
Student Name: Student Id: 8
1 out of 8
Related Documents
Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
Copyright © 2020–2026 A2Z Services. All Rights Reserved. Developed and managed by ZUCOL.





