logo

Domain Ontologies and the Lifecycle of a Data Science Project

   

Added on  2023-06-10

17 Pages7210 Words71 Views
Leadership ManagementProfessional DevelopmentTheoretical Computer ScienceData Science and Big DataArtificial IntelligencePhilosophyWeb Development
 | 
 | 
 | 
1 Background and Literature Review
The job of a data scientist is evolving each passing day. My MSc project will target to help data
scientists how to approach a data science project better. The project will work as a consultative
system that would assist a data science expert. This advisory system will be based on the
phenomenon called domain ontology. Ontologies, in general, streamlines usual characteristics
of a particular field of research for the purpose easier research and developing work. (Chiff,
2018).
1. Domain Ontologies
Before digging deep into the concepts of domain ontologies, life cycle of the project and
various techniques, there are some basic terms and characteristics that should be thoroughly
examined for understanding.
1.1 WHAT IS AN ONTOLOGY?
In this era of information and technology, every aspect of the technology is evolving in each
passing moment (Maxim, 2018). The relationships, values and properties of a technology
which also changes as part of its evolution. The word ‘ontology’ has a broad meaning as
compared to its meaning in the field of technology. Ontology’s dictionary meaning is to study
the existence of things.
In the field of information and technology, an ontology means generally accepted statements in
a field of study. In the field of artificial intelligence and data science, the ontology means a set
of ideas, interactions, relations and events in which researchers and analysts share the
information with each other. They are some pre-agreed or ground rules of a respective field of
study (Wang, 2016)
Every field, as it evolves creates complexities and differences in opinions between the experts.
Especially in the field of technology, it has been observed that researches, data scientists, and
analysts often come up with modified definition, rules, and regulation for a field of study.
Machine learning, artificial intelligence, big data, data sciences are some of the developing
industries under information and technology. Hence, experts believe that there should be a
predefined value, arguments, interactions, and values for each of the fields of study based on
- 1 -
Domain Ontologies and the Lifecycle of a Data Science Project_1

which; experts share their ideas and innovate products and revolutionizing technologies
(Tandon, 2018). This idea of streamlining the foundation of thought process is known as
ontology.
Ontologies limit and might overcome complexities of ideas and fundamentals in a field of
study. It can also be determined as the controlled thought of a word that is maintained globally
in an area of technology (Zheng, 2017). This accepted vocabulary is the foundation on which
researchers develop various research papers to improve the field of technology. As the unique
research papers are developed based on the ontologies, the translation of those papers are
become easier. For global acceptance of those papers, translating them becomes necessary.
(Lohr, 2018)
The ontologies provide the channel or the foundation on which further studies can be
conducted for a particular domain. Consider a process happening in the field of data science
that is based on facts (2(Zhao, 2014). Consider this process as the fundamental procedure of
this field. If we would try to explain or give it a name for future use, it would be considered in
the ontology. Hence, it is the representation or the formal naming of a set of concepts,
procedures, and relations between the entities (Sean Kandel1, 2011)
So, ontologies can be considered as the ground rules. If you are playing a sport, there will be a
set of rules based on which you will be rewarded points, fouls, penalties, etc. Hence, to develop
a data science project smoothly and successfully, you will need ontologies to ensure a
maximum satisfactory outcome. These ground rules, ontologies, must be developed smartly as
they will work as the foundation of the whole project (Loat, 2015).
1.2 WHAT IS THE “DOMAIN ONTOLOGY”?
Domain ontology represents a set of ideas belong to specific universe. For example, Biology,
Politics and Information, and technology. These three are three different domains whose
ontologies are entirely distinct from each other (Tei, 2017). The relation and characteristics of
the ontologies of these independent domains will be completely different in not only meaning
but value. The term “mouse” is related to Biology and information and technology. But the
interpretation of that word will be completely different. In the biology, the word mouse means
an animal. However, in the domain of information and technology, the word mouse means a
hardware which is used as an input device in personal computers. (Zhao, 2014)
- 2 -
Domain Ontologies and the Lifecycle of a Data Science Project_2

The concept of Domain Ontology is revolved around the relations and concepts of a domain of
the field of study. In today’s era, the human consciences have evolved. Hence in the same field
of interest, due to different perspective of the experts, ontologies can be diversified due to
obvious reasons. (Endel, 2015) Author’s perspective, their vision, their approach toward a
problem, their way of executing a project, etc. Hence, practically, it not possible to develop a
complete domain ontology for a domain because it is not only expensive, but it requires an
investment of time and resources to be able to cope with the objective in this ever-changing
world (Sean Kandel, 2011)
As per the requirement of the project, it is imperative that domain ontology is predefined
before starting the developing of the project to ensure that there are certain concepts, relations
between properties, values, etc. that are pre-determined. Based on those values, there is a
possibility of better developments on a different level (Brooks-Bartlett, 2018)
However, as already stated, it is practically impossible to cover the whole domain in a set of
ontologies. The world is rapidly changing over a short span of time. Hence, it might be possible
that all the resources, time and money invested in developing a set of ontologies, will not help
in the future because those domain ontologies will not be relevant anymore as the future
changes. So, you can at most create a core domain ontology that would only cover the basic
classes of that domain. As already discussed, adding dynamic classes in ontology might not be
relevant in the coming time. So just defining the basic classes of a domain is a good start to
develop a domain ontology.
As per the example mentioned at the start, the identical word is distinct for the two domains.
With, inside the same domain, the interpretation of ontologies will be completely different for
distinct projects (Karen, 2017). Even in the same project, it often happens that domain
ontologies are no longer compatible with the project. Hence, it becomes a responsible job for
the ontology developer to implement a robust and scalable ontology that would cope with any
type of requirement may arise in the span of the project.
The ontology creator needs to be skilful enough to comprehend the whole project rigorously.
Only then he can give his fullest potential, which is required for developing ontologies for the
project.
1.3 PROTÉGÉ TOOL
Developing an ontology for a project is not a process for a naïve. Certainly, it requires a
significant knowledge of data science but you need a system on which you can define those
- 3 -
Domain Ontologies and the Lifecycle of a Data Science Project_3

ideas as ontologies. Just like coders uses different editors for coding in specific languages, there
needs to be the genuine editor for coding ontologies.
For example, for web development, there are many editors like Sublime Text Editor, Notepad+
+, Komodo Edit, etc. The job of these editors is to allow the programmers to write code that
would give the existence of the web applications. For Java and Android, there is an editor
called Eclipse. The fundamental purpose of these editors is to enable the developers to write
code for a specific programming language for their projects. Furthermore, for certain type of
language coding, there are some predefined library files that must be imported to let the
functions run (Ontologies: Practical Applications., 2018). However, if you explore a specific
type of language in your developing journey, there are many tools which are available without
any cost while others available on annual or monthly charges basis. The developers of this
software have decided a purchasing fee they charge as a royalty.
For ontologies, Protégé is widely considered the leading ontological engineering tool. Protégé
is free for download. Along with editing, Protégé is used as a knowledge management system.
These types of editors provide the platform to the developers where they can write code and
test it. This phenomenon is known as a graphical user interface where coders can write a
programming language and code. With Protégé’s graphic user interface, you can define the
ontologies for your project.
Protégé has been created by the Stanford University. They made it available for the public use
under a license. They call it BSD-2 clause. However, the previous versions of this tool are
carried out with the collaborative approach with Manchester University.
The plugin architecture of this tool is robust enough to not only build simple ontology but a
complex ontology that would revolutionize any project. To develop a wide range of the
productive problem-solving systems, the creators of ontology can use the output of Protégé to
the other mainstream systems. Furthermore, it might be possible that not everyone is familiar
to the framework. Hence, Stanford University offers full support to new users by providing
community-based and paid support.
The Protégé had an active community of students, developers and entrepreneurs who ask
questions, discuss solutions and share their plug-ins. The overall contribution of the
community provides valuable solutions to any type of developing and technical questions.
Protégé supports W3C standards that are the proof of its verification from the W3C itself.
Another major aspect of using Protégé is its enhanced open source environment. It is based on
object-oriented language Java and supports plug and play, which provides the flexibility to the
users (Pierson, 2017)
- 4 -
Domain Ontologies and the Lifecycle of a Data Science Project_4

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Artificial Intelligence (AI) - Overview
|10
|2678
|247

Data Science Research Methods Strategy
|16
|2141
|16

Artificial Intelligence in Healthcare
|5
|1198
|176

The Importance of Collaboration in Science
|2
|575
|59