Domain Ontologies and the Lifecycle of a Data Science Project

Verified

Added on 2023/06/10

AI Summary

This article discusses the importance of domain ontologies and the lifecycle of a data science project. It explains the concept of ontologies, domain ontologies, and the Protégé tool. It also covers the phases of a data science project, including business understanding, data understanding and preparation, data modelling, and more. The article provides expert advice for data scientists and analysts.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.

1 Background and Literature Review
The job of a data scientist is evolving each passing day. My MSc project will target to help
data scientists how to approach a data science project better. The project will work as a
consultative system that would assist a data science expert. This advisory system will be based
on the phenomenon called domain ontology. Ontologies, in general, streamlines usual
characteristics of a particular field of research for the purpose easier research and developing
work. (Chiff, 2018).
1. Domain Ontologies
Before digging deep into the concepts of domain ontologies, life cycle of the project and
various techniques, there are some basic terms and characteristics that should be thoroughly
examined for understanding.
1.1 WHAT IS AN ONTOLOGY?
In this era of information and technology, every aspect of the technology is evolving in each
passing moment (Maxim, 2018). The relationships, values and properties of a technology
which also changes as part of its evolution. The word ‘ontology’ has a broad meaning as
compared to its meaning in the field of technology. Ontology’s dictionary meaning is to study
the existence of things.
In the field of information and technology, an ontology means generally accepted statements
in a field of study. In the field of artificial intelligence and data science, the ontology means a
set of ideas, interactions, relations and events in which researchers and analysts share the
information with each other. They are some pre-agreed or ground rules of a respective field of
study (Wang, 2016)
Every field, as it evolves creates complexities and differences in opinions between the experts.
Especially in the field of technology, it has been observed that researches, data scientists, and
analysts often come up with modified definition, rules, and regulation for a field of study.
Machine learning, artificial intelligence, big data, data sciences are some of the developing
industries under information and technology. Hence, experts believe that there should be a
predefined value, arguments, interactions, and values for each of the fields of study based on
which; experts share their ideas and innovate products and revolutionizing technologies
- 1 -

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

(Tandon, 2018). This idea of streamlining the foundation of thought process is known as
ontology.
Ontologies limit and might overcome complexities of ideas and fundamentals in a field of
study. It can also be determined as the controlled thought of a word that is maintained globally
in an area of technology (Zheng, 2017). This accepted vocabulary is the foundation on which
researchers develop various research papers to improve the field of technology. As the unique
research papers are developed based on the ontologies, the translation of those papers are
become easier. For global acceptance of those papers, translating them becomes necessary.
(Lohr, 2018)
The ontologies provide the channel or the foundation on which further studies can be
conducted for a particular domain. Consider a process happening in the field of data science
that is based on facts (2(Zhao, 2014). Consider this process as the fundamental procedure of
this field. If we would try to explain or give it a name for future use, it would be considered in
the ontology. Hence, it is the representation or the formal naming of a set of concepts,
procedures, and relations between the entities (Sean Kandel1, 2011)
So, ontologies can be considered as the ground rules. If you are playing a sport, there will be a
set of rules based on which you will be rewarded points, fouls, penalties, etc. Hence, to
develop a data science project smoothly and successfully, you will need ontologies to ensure a
maximum satisfactory outcome. These ground rules, ontologies, must be developed smartly as
they will work as the foundation of the whole project (Loat, 2015).
1.2 WHAT IS THE “DOMAIN ONTOLOGY”?
Domain ontology represents a set of ideas belong to specific universe. For example, Biology,
Politics and Information, and technology. These three are three different domains whose
ontologies are entirely distinct from each other (Tei, 2017). The relation and characteristics of
the ontologies of these independent domains will be completely different in not only meaning
but value. The term “mouse” is related to Biology and information and technology. But the
interpretation of that word will be completely different. In the biology, the word mouse means
an animal. However, in the domain of information and technology, the word mouse means a
hardware which is used as an input device in personal computers. (Zhao, 2014)
The concept of Domain Ontology is revolved around the relations and concepts of a domain of
the field of study. In today’s era, the human consciences have evolved. Hence in the same field
of interest, due to different perspective of the experts, ontologies can be diversified due to
- 2 -

obvious reasons. (Endel, 2015) Author’s perspective, their vision, their approach toward a
problem, their way of executing a project, etc. Hence, practically, it not possible to develop a
complete domain ontology for a domain because it is not only expensive, but it requires an
investment of time and resources to be able to cope with the objective in this ever-changing
world (Sean Kandel, 2011)
As per the requirement of the project, it is imperative that domain ontology is predefined
before starting the developing of the project to ensure that there are certain concepts, relations
between properties, values, etc. that are pre-determined. Based on those values, there is a
possibility of better developments on a different level (Brooks-Bartlett, 2018)
However, as already stated, it is practically impossible to cover the whole domain in a set of
ontologies. The world is rapidly changing over a short span of time. Hence, it might be
possible that all the resources, time and money invested in developing a set of ontologies, will
not help in the future because those domain ontologies will not be relevant anymore as the
future changes. So, you can at most create a core domain ontology that would only cover the
basic classes of that domain. As already discussed, adding dynamic classes in ontology might
not be relevant in the coming time. So just defining the basic classes of a domain is a good
start to develop a domain ontology.
As per the example mentioned at the start, the identical word is distinct for the two domains.
With, inside the same domain, the interpretation of ontologies will be completely different for
distinct projects (Karen, 2017). Even in the same project, it often happens that domain
ontologies are no longer compatible with the project. Hence, it becomes a responsible job for
the ontology developer to implement a robust and scalable ontology that would cope with any
type of requirement may arise in the span of the project.
The ontology creator needs to be skilful enough to comprehend the whole project rigorously.
Only then he can give his fullest potential, which is required for developing ontologies for the
project.
1.3 PROTÉGÉ TOOL
Developing an ontology for a project is not a process for a naïve. Certainly, it requires a
significant knowledge of data science but you need a system on which you can define those
ideas as ontologies. Just like coders uses different editors for coding in specific languages,
there needs to be the genuine editor for coding ontologies.
For example, for web development, there are many editors like Sublime Text Editor, Notepad+
+, Komodo Edit, etc. The job of these editors is to allow the programmers to write code that
- 3 -

would give the existence of the web applications. For Java and Android, there is an editor
called Eclipse. The fundamental purpose of these editors is to enable the developers to write
code for a specific programming language for their projects. Furthermore, for certain type of
language coding, there are some predefined library files that must be imported to let the
functions run (Ontologies: Practical Applications., 2018). However, if you explore a specific
type of language in your developing journey, there are many tools which are available without
any cost while others available on annual or monthly charges basis. The developers of this
software have decided a purchasing fee they charge as a royalty.
For ontologies, Protégé is widely considered the leading ontological engineering tool. Protégé
is free for download. Along with editing, Protégé is used as a knowledge management system.
These types of editors provide the platform to the developers where they can write code and
test it. This phenomenon is known as a graphical user interface where coders can write a
programming language and code. With Protégé’s graphic user interface, you can define the
ontologies for your project.
Protégé has been created by the Stanford University. They made it available for the public use
under a license. They call it BSD-2 clause. However, the previous versions of this tool are
carried out with the collaborative approach with Manchester University.
The plugin architecture of this tool is robust enough to not only build simple ontology but a
complex ontology that would revolutionize any project. To develop a wide range of the
productive problem-solving systems, the creators of ontology can use the output of Protégé to
the other mainstream systems. Furthermore, it might be possible that not everyone is familiar
to the framework. Hence, Stanford University offers full support to new users by providing
community-based and paid support.
The Protégé had an active community of students, developers and entrepreneurs who ask
questions, discuss solutions and share their plug-ins. The overall contribution of the
community provides valuable solutions to any type of developing and technical questions.
Protégé supports W3C standards that are the proof of its verification from the W3C itself.
Another major aspect of using Protégé is its enhanced open source environment. It is based on
object-oriented language Java and supports plug and play, which provides the flexibility to the
users (Pierson, 2017)
2. The lifecycle of a data science project
The lifecycle of a Data Science project differs from project to project and organization to
organization (Jacquette, 2014)The significance of the lifecycle of any project is to denote what
- 4 -

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

is followed by what procedure and how every phase of the project relates to the rest of the
project. The basic steps of a data science project are described below.
2.1 Business understanding
Business Understanding is the foundation of the whole Data Science Project. This step helps to
understand the purpose of a project. You will get the idea of what is expected from the entire
project and why it should be developed. To achieve the end goal of the project, this step assists
the researcher to ease the whole methodology (&(Schutt & Neil, 2004). The researcher gets to
know what parameters of the business is going to affect the overall outcome of the project so
that he can modify the concerned segments of the process accordingly. This step gives the idea
of which parameter’s value will determine the success rate of the domain. Through this step,
the researcher gets the idea of the admissible data sources that the enterprise needs the full
access of or can build a framework through which they can monitor and assess the source.
Defining the goals and find the relevant data sources to predict the complete outcomes of the
project are the core objectives of this step. Defining objective has the highest priority because
based on that research, the overall project will take the shape. Asking accurate questions for
the existence of the project is the way to define the goals. Because the objective of every
project is to find the answer to a question and make the overall procedure smoother. Hence,
tough questions are the key ingredients in defining the foundation and objective of the project.
To answer such questions, various data sources should be analysed and this is part of the next
steps.
2.2 Data Understanding & Data Preparation
Data Understanding and Data Preparation are notoriously famous as the most time-consuming
phase of the Data Science project. With that being said, these processes have the significant
impact on the overall project at the end. As it is implied by its name, Data Understanding
means to comprehend the data in the raw format. Find the relevant attributes of the gathered
data which are going to impact the overall project. The difficulty of the analysis can vary
widely. The analysts may have to analyse a spreadsheet which only needs filtration and
modification. On the other hand, they may have to deal with millions of data entry and
hundreds of attributes that will be stored in a large data file.
Moreover, a data analyst should be able to identify which types of data are helpful for a project
and in those data, which are the attributes that should be used to deliver a successful project.
After the rigorous work of understanding the data, the next step is the data preparation. Data
Preparation is the procedure in which the analysts transform the data into refined information
- 5 -

to enable the use of Business Intelligence or any other data analysis technique within an
organization. The preparation of the data can be performed during the stage of the data
understanding. As an example of data preparation, a dataset can contain information which is
compiled in different formats, different language, various calculation, etc. (Houlihan, Data
wrangling , 2016)
In the data preparation, while merging everything into the one major dataset, it might happen
that the information from either of the source is not like the expected form or it might be
missing entirely. In that case, the anticipated usage of the information will be void completely
and that data cannot be used for research. Major tasks of the data preparation phase include
imputation of missing data, joining of various data sets, changing of the data-types,
identification of outliers, etc. At the end of this phase, it is ensured that the data preparation is
valid for the use of intended purpose and can be used in a Data Science project.
2.4 Data Modelling
After completing the initial phase, there would be some final datasets and information that is
going to be used for project execution. In a Data Science project, the information is relatively
higher and with many variables and attributes. Currently, deciding the flow of execution is
also very important. It might be possible that even though the data is cleaned and merged
properly from thousands of resources if the flow of execution is not optimized, all the hard
work done in the previous phase might not give you the anticipated end results (Houlihan,
Data wrangling , 2016)
Data Modelling is known as the graphical representation of how the project will execute step
by step. It can be even considered as the flowchart of the whole process where every major
and deciding phase of the procedure will be considered to generate a successful result. (Endel,
2015) A well-researched data model that satisfies logical and conceptual sides of the project
will ensure everything is under control and it will allow the researchers and analysts to make
changes in the flow to ensure the error-free and successful execution.
2.5 Evaluation
Evaluation fills the bridge between the manufacturer and end user. If this phase is done
properly, there is a significant amount of confidence gained that the procedure is likely to
succeed in the market. With, data evaluation demands a great deal of understanding of not
only the market but the product. During this phase, you can check what is working and what is
- 6 -

not. If you find any error and any required modification in the process, you can apply as many
times as possible until you get the desired output in the end.
2.6 Deployment
Deployment is the final stage of the data science lifecycle. From collecting data to modelling
them and after doing a final evaluation of them, it comes down to deploy them for the end
user. After the data sets have performed well and produced an acceptable and anticipated
result, deployment is a stage where the product is deployed and becomes available to the end-
user. Based on the type of business and industry, deployment is done in either real-time or
production-like environment. Some also prefer to deploy the application on a batch basis to
ensure maximum acceptance.
3. Data Wrangling
In this era of information, organizations with the most data have more power. Generally, it is
not easy for the companies to monitor and manage such millions of data at the same time.
Data, as it is widely perceived, is a collection of facts that is converted into a form that a
computer can comprehend (Makaranka, 2018). It is a world known fact that data dominating
the world of irrespective of whichever industry you are currently associated with.
From finance to healthcare, from real estate to education, from information and technology to
government organization, almost each industry of the current world is driven and controlled by
the phenomenon called Data (Endel, 2015).
We, as humans, communicate and understand everything around us in form of information
communicated with each other, while computers understand the same information in the form
of the machine languages. So, the machine-readable information is known as Data. It has been
a general observation in the corporate culture that a significant number of hours by most of the
managers is spent in the hunt of finding the relevant data (Upside Staff, 2018). The data
processed and stored by the companies are raw. Data is an information if used properly and
efficiently could be modified in a productive information that would enhance the overall
knowledge regarding the overall process. Through data, you can not only know what has
happened so far but can increase your services and products through deeper insights hidden
inside the raw data. So, eventually, the Data will be an enormous knowledge hidden in a form
of numbers and statistics.
- 7 -

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

If utilized properly, Data can help a business improve exponentially. The current scenario of
the industries demands quick adaptation and response to the data received in the backend.
Though it might be possible that the functioning of a company may take some time to react to
a set of data (Makaranka, Real-Time Analytics: Challenges and Solutions, 2018). However,
the managers should develop an intelligent way to process tons of data accurately and
efficiently.
In the field of Data Science, where the Data is in more complex forms with hundreds and
thousands of variables, it becomes extremely important for researches and business
intelligence officials to cut down the unnecessary information, clean the data sets and come up
with most helpful values that can ensure the successful execution of the project (Owl, 2018)
As discussed in the lifecycle of the Data Science project, phases like Data Understanding and
Data Preparation determines of the whole procedure. Data Understanding and Data
Preparation both are collectively known as Data Wrangling. As the name of the project has the
phrase Data Wrangling, this phase of the lifecycle is one of the most crucial and important
parts of the Data Science. Data wrangling is the procedure of converting the raw data into
more appropriate and understandable data set so that they can be better utilized for real-time
application purposes. Globally, there are more and more data generated and stored in databases
across the world. Hence, it becomes imperative that those messy and complicated data sets
should be redefined and utilized for deeper analytical purposes. The challenge is to find the
proper data acquisition channels which are the sources of the data. Then, run a fast filtration
that would not only convert the raw data but make it more informative and user-friendly for
the analysts.
The idea here is to gather as many data as possible and reveal the most important parts of the
same. Since the sources of the data gathered are different from each other, it would happen
that the format and the variables covered in every data sets are different from each other. It is
also possible that in a few datasets there are values which are no longer valid or corrupted for
technical reasons. It might happen that some of the values are not available in the first place.
So, it is necessary to keep every possible scenario in mind before going ahead (Houlihan, Data
wrangling. , 2016). There are third-party tools available that can covert the data sets which are
available in the different formats to our most-liked format to get a unified data set. However,
the unavailable values and corrupted variables are still present in the global format. Hence, in
that case, there is a necessity for an expert researcher who can fill those blank and invalid
values for obvious purposes. The person might have to manually edit a few data entries so that
the overall procedure can work smoothly afterward.
- 8 -

Most of the data scientists would accept the fact that most of their time and resources are
invested in the data filtration, data clearing, wrangling instead of coding the module that
would use those data. Data Wrangling provides the integrity to the data. After the whole data
wrangling procedure, you will get a set of information that is consistent and solution oriented
(Kauermann, 2018). The old and invalid data will be wiped out through the process. Based on
your project and outcomes, it might be possible that you would like to add your variable and
set of information inside the final data set. Again, in the tedious tasks of data cleaning, adding
your own variables might increase the complexity of managing huge information altogether.
Hence, Data Wrangling overcomes this by giving the analyst full control over what to include
in the final data set (Upside Staff, 2018)
In an organization, there might come a scenario where you need to explain those complex data
sets that you have received from your colleagues, managers, or stakeholders. It is evident that
not everyone is familiar with analysing such a large data sets except someone who is a data
scientist or data analyst (Upside Staff, 2018). At that time, convey your message and your
handwork to your employees, management, stakeholders smartly are what makes a huge
difference to the future of your project. Hence, data wrangling will make your complicated
data to comprehensible and actionable data sets that would convey your message to the
stakeholders.
DATA Quality
Data is a set of values. It is the information that is gathered for reference or analysis purposes.
Data is the one thing that drives the major industries and sub-industries of information and
technology. Without Data, the whole world will be in hibernate mode as in this ever-changing
world, data is the fundamental element behind everything. Here, the data which is referred to
is the data stored on the internet or user-generated information that is available for the research
purposes (Ostertag, 2010). Data is the information that data scientists use while developing a
new product or executing a project. Without the data, the analysts cannot conquer anything
productive because they do not have any standpoint from where they can compare their
outcomes.
The quality of the data is also equally important. When the domain is small and the
engagement is optimum, you can expect quality data. However, as the domain expands, the
consistency of the data quality becomes significant. If the data that is gathered is reliable for
the real-world applications, it is known as quality data (Houlihan, Data wrangling , 2016). The
higher the quality of the data is determined, the higher its positive impact on the execution of
the project. However, the data can also be considered as good quality data if it can reflect the
- 9 -

real-world scenarios accurately and efficiently (Cleveland, 2014). That representation of the
real world can be utilized to improve the quality of the project. It will be at another level if
such accuracy is achieved through data quality.
In a data science project, the overall data that is gathered from the enormous sources are might
be irrelevant at the time due to obvious reasons (Cooper, 17). The information collected from
such scenarios have the different format, corrupted values, unfulfilled values in the database,
etc. Anything that is lagging or any irrelevant information will reduce the quality parameter of
the data. Hence, there will be basic cleaning procedure that should be performed to ensure the
quality of the data.
4.1 Why Quality data is required?
Quality data can harness analysts and data scientists in analysing the information stored in the
datasets and apply their knowledge into their projects. This will elevate the quality of your
project to a whole new level (Hiltbrand, 2018). If your data is of worthy quality, there will be
productive variables and values of the database entries that will allow you to react to a real-
time scenario. As already stated, better the data reflect the real world, the higher its chances to
be considered as top-quality data. Hence, if it reflects the real-world scenario that positively or
negatively affecting your product at the end, you can react spontaneously about whether you
want to change something or improvise the existing condition of the project (Dooley, 2018)
If you ever want to audit your project, quality data can be the benediction. Because the quality
data will make your efforts to half by ensuring maximum return of investment. Hence, it is
advised to always ensure gathering quality-rich information (Caplin, 2017).
5. Building an advisory system as a web application
5.1 What is an advisory system?
- 10 -

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Advisory system, as the name goes is the system that helps to solve a problem. Such systems
target those issues which are usually handled by humans who dedicates his time and
intelligence for solving problems. (Grow, 2018). The advisory system overcomes resource
investment because they are smart enough to cope with the intelligence required to solve the
problem that may arise in a specific scenario. So, the advisory system can also be defined as
the expert system. These types of system replicate the human intelligence by come-up with
real-time solutions. Hence, experts’ opinion can be easily substituted thanks to the excellent
ability of the advisory system to come up with the solution (Little)
The real challenging phase comes when we need to develop such an advisory system for a
particular type of project. Ideally, the human intelligence of solving problems has to be coded
and developed into the machine language that is scalable enough to cope with the many issues
that might arise during the execution. Though, the code needs to be robust enough to cope
with similar potential scenarios that would have been present otherwise (Calì, 2017) . So, if
you are developing the module, your code must be efficient enough and make the machine so
intelligent that it can cover several scenarios in a single snippet.
5.2 Boon of advisory system to novice users
Consider somebody who is advice-giving individual to a decision maker of the team. Here, the
challenge for the decision maker is to identify the presence of a problem before giving any
decision regarding the problem. It usually happens in the hierarchy of the company that the
decision maker person is attached with the management. Means, he has other major roles to
look for along with decision making for projects. Hence, it might be possible that he would not
get enough time to look for problems, come up with a solution and give confirmation on the
solution to that problem. In this case, the advisory system can be extremely beneficial for the
decision maker as half of the job done by the system itself. All he must concern about the
solution that the system can find and if he finds it a perfect fit for the matter, he can give the
final confirmation (Bluttman, 2009)
However, one of the major obstacles is to develop a robust advisory system that has the best
cognitive and reasoning abilities to solve any problem that may arise in the project. Its job is to
find the options for alternative decisions that will have the highest chances to turn up as the
most desirable output at the end (J., 2015). The role of the advisory system is to provide the
solution of the open-ended scenarios. In other words, it can be considered as a highly skilled
architecture that gives intelligent solutions to the unstructured problems. Hence, the decision
maker has to be experienced enough to decide if he must accept the suggested solution of the
- 11 -

advisory system. Based on his skills, there will be chances of the project will be capable
enough to deliver expected results. (J, 2018)
But with the passage of time, the real-time experiences will help a novice analyst to adapt and
evolve in the system (Bera, 2017). A skilled analyst can grasp the functioning of the system
faster with the help of the smart advisory system. His job will be half done if there is a
trustworthy advisory system constantly monitoring the potential problems that might occur
soon. Hence, the system will be able to forecast the future before happening and apply
necessary changes instantly thanks to the fine understanding between the decision maker and
the system itself.
If everything is going well, a fresher analyst will have a thorough understanding of the whole
procedure instantaneously and he can come up with a creative and efficient idea that might
change the direction of the whole procedure (Nagrecha, 2015). The advantage of a new-comer
in any organization is that they are filled with enthusiasm and courage. If given proper
assistance and help at the right time, these people can create wonders in the enterprise. A smart
machine and a passionate analyst is the best combination an organization can expect at this
time.
5.3 Why Advisory System should be developed as a Web Application
Since the evolution of the internet, websites and the internet itself has become an integral part
of the lifestyles of the people (Biewald, 2018). It has been observed that a significant number
of not only professional individuals but college teenager accesses the internet and surf the web
through websites (Bembenik, 2013). This clearly shows the how comfortable people are with
this architecture of finding the information on the internet from anywhere anytime with just a
few taps and clicks on the smart device.
The major unique selling point of any web application is that it reduces the cost of the project.
There is no need to go for any physical installation wherever and whenever you want to run a
web application. Every computer system owns a web browser today. Just go to the address by
typing the required URL and just hit enter and you are there at a single click.
The reason for going for a web application is because of its availability of 24x7 and 365 days.
No need to opt for any specific hours of the day in order to get the job done. Another point is
that for using a web application, you do not need any specific hardware. Just a standard PC or
a smartphone is a good fit for your objective.
- 12 -

You can access the web application anywhere and anytime in the world. As already mentioned,
whenever you want the advisory system just go to its web application irrespective of your time
zone and geographical location (McGuinness., 2001)
A web application mostly supports centralized data storage structure. Hence, anything you do
on the web will be stored in the repository or the cloud from which you can access the files
even in the coming time (Kauermann, 2018).
- 13 -

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

REFERENCES
(Heidegger, 2. (n.d.).
Andrew McAfee, E. B. (2012, October). Harvard Business Review. Retrieved from Harvard Business
Review: https://hbr.org/2012/10/big-data-the-management-revolution
article, R. T. (2018). Demand For Data Scientists Surge By 400% In India. Retrieved from
www.businessworld.in: http://www.businessworld.in/article/Demand-For-Data-Scientists-
Surge-By-400-In-India-/11-07-2018-154540/
Bembenik, R. (2013). (2013).Intelligent tools for building a scientific information platform. Berlin:.
Springer.
Bera, S. D. (2017). Life cycle reliability analysis using imprecise failure data.Life Cycle Reliability and
Safety Engineering,. Springer, 6(4).
Biewald, L. (2018). The data science ecosystem part 2: Data wrangling. Retrieved from
www.computerworld.com: https://www.computerworld.com/article/2902920/the-data-
science-ecosystem-part-2-data- wrangling.html
Bluttman, K. (2009). Access Hacks. O'Rieley Media.
Brooks-Bartlett, J. (2018). Here’s why so many data scientists are leaving their jobs. Retrieved from
towardsdatascience.com: https://towardsdatascience.com/why-so-many-data-scientists-are-
leaving-their-jobs-a1f0329d7ea4
Calì, A. W. (2017). Data Analytics. Cham. Springer International Publishing.
Caplin, A. (2017). INTRODUCTION TO SYMPOSIUM ON “ENGINEERING DATA ON INDIVIDUAL AND
FAMILY DECISIONS OVER THE LIFE CYCLE. Economic Inquiry,, 56(1).
Chi, Y.-L. (FEB/2007). Elicitation synergy of extracting conceptual tags and hierarchies in textual
document. Expert Systems with Applications (EXPERT SYST APPL), 349-357.
Chiff, M. (2018). Timely Information: How Current Is This? Retrieved from
https://tdwi.org/articles/2018/03/02/bi-all-timely-information-how-current-is-this.aspx
Cleveland, W. a. (2014). Divide and recombine (D&R): Data science for large complex
data.Statistical Analysis and Data Mining:. The ASA Data Science Journal,, 7(6).
Cooper, J. a. (17, 4). Commentary on issues in data quality analysis in life cycle assessment. The
International Journal of Life Cycle Assessment,, 2012.
Dooley, B. J. (2018). Humans in the Loop for Machine Learning. Retrieved from
https://tdwi.org/articles/2018/07/09/adv-all-humans-in-loop-for-machine-learning.aspx
Endel, F. a. (2015). Data Wrangling: Making data useful again. IFAC-PapersOnLine, 48(1).
Gang Lv, C. H. (2016). Research on recommender system based on ontology and genetic algorithm.
Neurocomputing, Volume 187, 92-97. doi:https://doi.org/10.1016/j.neucom.2015.09.113.
- 14 -

Grow, G. (2018). Reducing the Impact of Bad Data on Your Business. Retrieved from
https://tdwi.org/articles/2018/07/06/diq-all-reducing-the-impact-of-bad-data.aspx
Hiltbrand, T. (2018). 5 Advanced Analytics Algorithms for Your Big Data Initiatives. Retrieved from
https://tdwi.org/articles/2018/07/02/adv-all-5-algorithms-for-big-data.aspx
Houlihan, P. (2016). Data wrangling .
Houlihan, P. (2016). Data wrangling. .
J, L. (2018). Advancing science and technology with big data analytics.Statistical Analysis and Data
Mining. The ASA Data Science Journal,, 11(3), 97.
J., L. (2015). Big Research Data and Data Science. Data Science Journal, 14.
Jacquette, D. (2014). Ontology. Abingdon, Oxon: Routledge. Retrieved from Ontology. Abingdon, Oxon:
Routledge.
Karen. (2017). How to minimise data wrangling? Retrieved from https://dzone.com/articles/how-to-
minimize-data-wrangling-and-maximize-data-i-1
Kauermann, G. a. (2018). Data Science: a proposal for a curriculum. International Journal of Data
Science and Analytics.
Little, D. (n.d.). Fallibilism and Ontology in Tuukka Kaidesoja’s Critical Realist Social Ontology. Journal
of Social Ontology, 1(2).
Loat, N. (2015). What is Data Wrangling . Retrieved from https://www.datawatch.com/what-is-data-
wrangling/
Lohr, S. (2012, February 11). The age of Big Data. Retrieved from N.Y. Times:
https://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-world.html
Lohr, S. (2018). For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights. Retrieved from
https://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-
is-janitor-work.html
Makaranka, I. (2018). Real-Time Analytics: Challenges and Solutions. Retrieved from
https://tdwi.org/articles/2018/06/15/adv-all-real-time-analytics-challenges-and-
solutions.aspx
Makaranka, I. (2018). Real-Time Analytics: Challenges and Solutions. Retrieved from
https://tdwi.org/articles/2018/06/15/adv-all-real-time-analytics-challenges-and-
solutions.aspx
Maxim, G. (2018). Data Digest: Data Science Platforms, Job Ranking, and Marketing. Retrieved from
https://tdwi.org/articles/2018/03/06/adv-all-gartner-data-science-0306.aspx
McGuinness., N. F. (2001). ``Ontology Development 101: A Guide to Creating Your First
Ontology''. Stanford Knowledge Systems Laboratory Technical Report , Stanford
Medical Informatics Technical Report SMI-2001-0880, March.
Nagrecha, S. a. (2015). Quantifying decision making for data science: from data acquisition to
modeling. EPJ Data Science , 5(1).
- 15 -

Ontologies: Practical Applications. (2018). Retrieved from Datasciencecentral.com :
https://www.datasciencecentral.com/profiles/blogs/ontologies-practical-applications
Ostertag, S. (2010). Processing Culture: Cognition, Ontology, and the News Media1 . Sociological
Forum,, 25(4), 824-850.
Owl, M. (2018). Data Stories: Simplification and Abstraction. Retrieved from
https://tdwi.org/articles/2018/03/14/bi-all-visualization-simple.aspx
P. Cimiano, A. H. (2005). Learning concept hierarchies from text corpora using formal concept analysis.
Journal of Artificial Intelligence Research, 24:305–339.
Pierson, L. a. (2017). Data science. Hoboken, NJ: John Wiley and Sons, Inc.
Press, G. (2016, 03 23). forbes.com. Retrieved from
https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-
least-enjoyable-data-science-task-survey-says/#732c23c36f63
Schutt, R. a., & Neil, C. (2004). Doing Data Science .
Sean Kandel, J. H. (2011). Research directions in data wrangling: Visualizations and transformations for
usable and credible data. SAGE Journals.
Sean Kandel, J. H. (2011). Research directions in data wrangling: Visualizations and transformations for
usable and credible data . Information Visualization , Vol 10, Issue 4, pp. 271 - 288.
Sean Kandel1, J. H. (2011). Research directions in data wrangling: Visualizations and transformations
for usable and credible data. SagePub.
Tei, P. (2017). Fewer Flights, Bigger Delays and a Bad Year for JetBlue: 17 Charts on the U.S. Airline
Industry in 2017. Retrieved from
https://medium.com/towards-data-science/data-science/home
Tye Rattenbury, J. M. (2017). Principles of Data Wrangling: Practical Techniques for Data Preparation.
O'Reilly Media, Inc.
Upside Staff. (2018). Data Digest: Applying Machine Learning, Deep Learning, and AI. Retrieved from
https://tdwi.org/articles/2018/06/28/adv-all-applications-0628.aspx
Upside Staff. (2018). Data Digest: Projecting Winners and Monitoring Screen Time. Retrieved from
https://tdwi.org/articles/2018/06/19/adv-all-odd-applications-0619.aspx
Upside Staff. (2018). Data Digest: Projecting Winners and Monitoring Screen Time. Retrieved from
https://tdwi.org/articles/2018/06/19/adv-all-odd-applications-0619.aspx
Wang, J. (2016). Big data cloud, mining and management. International Journal of Data Science .
Zhao, L. a. (2014). Ontology Integration for Linked Data. Journal on Data Semantics , 3(4), 237-254.
Zheng, W. (2017). Data Wrangling Versus ETL: What’s the Difference? Retrieved from
https://tdwi.org/articles/2017/02/10/data-wrangling-and-etl-differences.aspx
- 16 -