SIT717 Assignment: A Comprehensive Survey of QAS Classification

Verified

Added on 2022/09/18

AI Summary

This report presents a survey on the classification of Question Answering Systems (QASs). It begins by introducing the concept of QASs and their importance in information retrieval and natural language processing, highlighting their potential application in business intelligence. The survey then defines QASs and their role in providing automated answers to natural language queries, differentiating between factoid, definitional, and complex questions. The report examines the process of QASs, including linguistic analysis, question classification, and query generation. The survey further explores the criteria for classifying QASs, including application domains, question types, and data sources. It then delves into the classification of QASs based on their domain, distinguishing between general domain QASs (GDQASs) and restricted domain QASs, outlining their respective advantages, disadvantages, and research issues. The report concludes by summarizing the key findings and discussing potential future research directions in the field of QAS classification.

1
Classification of QASs: A Survey
Title: Classification of Question Answering Systems (QASs): A Survey

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

2
Classification of QASs: A Survey
Abstract
Natural language processing has emerged as one of the most prominent branches in artificial
intelligence alongside other fields such as business intelligence. But what if two prominent fields
can interact, and become collaborative? That is the question which often arises when the topic of
data management is brought up.
Objective
The objective of this paper is to conduct a survey of Question answering systems (QASs) as a
means of generating answers to questions that are asked in natural languages. Moreover, we seek
to examine how the concept of Question answering systems can be applied to the field of
business intelligence.
Findings
From the survey, we find that QASs can be applied to business intelligence through data
warehouse integration. Moreover, we note that there are up to 8 criteria of classifying QASs

3
Classification of QASs: A Survey
Introduction
Ever since the onset of the “Age of information” which begun during the 1970s, the question of
information access and knowledge has been dominant in a number of innovations. Such, can be
seen computer innovations such as search engines which have to a large extent revolutionized
the way people access information throughout the globe.
Generally, search engines provide a ranked list of documents which are somewhat relevant and
related to a users’ formulated keywords as responses a; of which have a basis on several aspects
including:
i. popularity estimates
ii. keyword correspondence
iii. How often documents are retrieved, etc.
Despite the impact of search engines on the way people retrieve information on the internet,
search engines do not on entirety accomplish the purpose of extracting information since users
will nevertheless have to explore the retrieved documents individually in order to determine
which of the documents hold the best information according to the user’s needs (Ferret, et al.,
2011).
In ideal scenarios, a search engine ought to give, “…few relevant and concise sentences as
answers along with their corresponding web links” (Kolomiyets, 2011). Since the late 1960s,
quite a number of QASs have been developed (Kolomiyets, 2011). Over the recent years, QASs
are being developed with the integration of modern methods such as natural languages which
enable the QASs to respond to user questions by natural language following advanced
information retrieval and processing from a range of knowledge bases (Vanessa Lopez, 2011).

4
Classification of QASs: A Survey
Moreover, modern QASs have adopted answers which have led to changes from the traditional
text to multimedia (Dwivedi, 2013). The QASs which have been developed over the years 1960s
tackle, “…different domains, data sources, types of questions, formats of answers, etc.; the
number of such QASs is too large” (Mishra & Jain, 2015). As such, to explore and determine
how these QASs perform as well as their related ability in satisfaction of current and future
needs, it is only fit that a survey of all such QASs is conducted.
Practically, the classification of QASs follows an explicit criteria of identification using the
criterion of identifying: “the application domains, questions, data sources, matching functions,
and answers” (Mishra & Jain, 2015). In this regard, we conduct a literature survey on QASs
classified using application domains and questions, suggest a future survey for QASs based on
other criteria which are not covered in this paper.
To address the study’s objectives, the paper is further divided into: section D (Body of Survey)
which is divided into several subsections which include: related work on QASs, criteria for the
identification of QASs, Classification of QASs, in relation to the aforementioned criteria, and the
last subsection will include a comparison of the classification of QASs. In the last section of the
paper, we will present a conclusion/summary of our survey.
Body of the Survey
Related Literature on QASs
Question-Answer System
To start us off, lets define what a QAS is. According to (Cimiano, et al., 2014), a “…question-
answering is a computer science discipline within the fields of information retrieval and natural
language processing (NLP), which is concerned with building systems that automatically answer
questions posed by humans in a natural language” (Cimiano, et al., 2014). Another definition

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

5
Classification of QASs: A Survey
follows the assumption that a QAS is the implementation of the QA is a computer program that
develops its answers through querying a structured database also referred to as a “knowledge
base” (Cimiano, et al., 2014). Another widely accepted definition is that by (BOUZIANE, et al.,
2015) which suggests that, a QAS is based on the arguments that, “For human-computer
interaction, natural language is the best information access mechanism for humans.” As
such, QASs entail special importance as well as advantages compared to search engines thus can
be viewed as the ultimate objective of semantic Web research for, “…user’s information needs”
(BOUZIANE, et al., 2015).
In aspects involving the retrieval of information and NLP, QA is identified as the process of
provide answers with automation for a given query by a human in natural languages. To this end,
Question-answering is often split into three distinctive tasks, that is: “question analysis,
document retrieval and answer extraction” (Cimiano, et al., 2014).
According to (Lopez, et al., 2011), most QASs adopt the posited tasks as shown in figure 1
below:
Figure 1: Source https://reader.elsevier.com/reader/sd/pii/S1877050915034663 (BOUZIANE, et al., 2015)
The Question Answering aspect is handled using NLP through interfacing QAS’s end point of users
that probably ask a wide range of questions. More specifically, Factoid questions are questions
concerned with Named Entity (NE). These include the usage of words such as “When, Where, How
much/many, Who, and What”. Such words are specific in their queries regarding attributes such as

6
Classification of QASs: A Survey
date/time, place, person, or an organization. The other type of questions includes those that are
concerned about the definition of a given term/concept. Questions which adopt the words "Why" or
"How" are the third variant which are relatively difficult to answer. As such, there are quite few
attempts carried out to try and answer such questions.
In practice, the objective of QASs is to enable different users to conduct queries in Natural
languages, employing their unique terminology, and ultimately receive a relatively concise answer.
Question-Answering Systems and Web-data
Since 1999, there have been an annual research conducted in open domain question-answering
which was substantiated by TREC Evaluation campaign. The role of TREC campaign is to make
available the local data set which is viewed as a knowledge base of used in producing answers.
However, the World Wide Web has seen an ever increasing number of databases on the web
which are to a large extent useful in the provision of information to the interested users.
Lately, QASs are largely web-oriented that is, for such systems, the user uses a natural language
in querying.
Process
The process is initiated through linguistic analysis i.e. “…dependency graphs using a syntactical
parser with a step of named entities recognition NER” (BOUZIANE, et al., 2015). Subsequently,
the given question is classified in relation to a prior defined question category. The SPARQL
query is then defined following two steps, that is: linguistic analysis as well as question
classification). Moreover, it is possible to use an external ontology resource can be used for
matching items obtained due to the ongoing process. In the end, upon generation of the SPARQL
query, the interrogation of the Linked Data (often training dataset) is conducted, and outputs a
concise answer in relation to the users question (BOUZIANE, et al., 2015). This scenario is
summarized in figure 2 below:

7
Classification of QASs: A Survey
Figure 2: Source https://reader.elsevier.com/reader/sd/pii/S1877050915034663 (BOUZIANE, et al., 2015)
Criteria for Classifying Question Answering Systems
To understand the criteria of classifying QASs, it is prudent that one understands the architecture
behind the whole concept of QASs as presented in figure 3 below:
Figure 3:source (Mishra & Jain, 2015)
Based on the brief literature presented in the preceding section, we note that there are up to 8
criteria that can be used in the classification of available QASs. These include:
Knowledge Base
Document Analysis and
Representation (syntactic and
higher level analysis, bag of
words, logical, etc.)
User Interface
(Users ask Questions and get back
answers)
Retrieval Model Question Representation (Database
tables, Bag of words, Logical)
Answers Processing Module
ranking/fusion( )
Answer
Verification Answer Post Processing
Answer
Extracted/Generated
Answer
Question Analysis and Classification (EAT,
syntactical and higher level analysis, question
focusing and question classes)

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

8
Classification of QASs: A Survey
i. Application domains
ii. Question types raised by the users
iii. Analysis types conducted on users’ questions as well as source documents
iv. Type of data explored from data sources
v. Features of data sources
vi. Types of views applied to questions as well as any matching functions
vii. Methods adopted when attempting to retrieve answers
viii. Types of answers extracted by the QASs
Classification of QASs
This section, involves the discussion of the details regarding the study’s posited classification of
question-answer systems. Generally, we offer a classification description, explore the advantages
and disadvantages of the two types of QASs that will be classified in each class paired with
related research issues. During implementation of QASs, it follows that the answers are
generated with regard of the questions which have been asked (Moldovan, et al., 2011). That is,
some users might need info for a generalized issue while others might require specified
information which is to be obtained from a given domain. Thus, it is somewhat natural to
classifier QASs with reference wo their domains.
General domain QASs (GDQASs)
These type of QASs provide answers to domain independent user queries. In General domain
QASs, the systems, scan for answers from a variety of document database. This method
generally offers moderate and not-so high quality of responses and are usually made by casual
users (Indurkhya & Damereau, 2010).
The fact that general domain QASs the quality of answers is relatively low and any satisfaction
by query answers is largely dependent on the specific tannish the usage of these systems. On the

9
Classification of QASs: A Survey
bright side, the systems boast of many users. Moreover, GDQASs can function without any
domain specific dictionary. Equally, “…Users don’t need to acquire knowledge of domain
specific keywords for formulating questions.” (Indurkhya & Damereau, 2010).
Restricted domain QASs
Adopting its functionality from its name, restricted domain Question-answer systems provide
answers only to specified domain questions. That is, responses queried from a domain specified
document collection. Thus, the warehouse of the related question patterns is highly fixed; leading
to the end that the systems have a better probability of obtaining good accuracy when answering
questions.
These QASs utilize domain specific “ontology and terminology” hence they generally offer high
quality query responses. As a consequence, different restricted Domain Question-answer systems
have developed their own literature including, for instance: “temporal domain QAS, geospatial
domain QAS, medical domain QAS, patent QAS, community based QAS” (Vanessa Lopez,
2011).
Implementation of these QASs necessitate the assigning of a given question to an appropriate
domain specific QAS with reference of any knowledge which is extracted from keywords
presented by the question. Generally, revolutionary encounters issues when dealing with the
specified queries to a given restricted domain QAS since the QASs face “…question
classification problems, ambiguity resolution problems” (Indurkhya & Damereau, 2010).
The merits of these systems lie in the fact that they tend to suite domain expert users because
they often require specialized responses to their queries. In addition, responses to such questions
have relatively high quality and that the satisfaction level is dependent on the user’s domain
knowhow.

10
Classification of QASs: A Survey
Mainly, the con of these systems is that they tend to have a limited repository from which to
draw a variety of responses hence can only answer limited amount of questions.
Question type classification
Processing answers for users’ questions has a direct relationship with the type of questions that
the users ask (Moldovan, et al., 2013). As such, classifying questions performed in QASs has a
direct effect on responses. Studies indicate that up to 36.4% of errors stem from question miss-
classification conducted in QASs (Moldovan, et al., 2013). In their paper, (Li & Roth, 2012)
classify questions into, “…a fine grained content based categorization but they deal with a very
limited class of real world questions.” Further, (Fan, et al., 2010) conduct a function oriented
classification for queries through the integration of matching patterns alongside different
machine learning methods. Elsewhere, (Farah, 2014)conducts similar classification through
focusing on the expected types of answered by users.
One particular category of questions in question based classification of QASs is the factoid type
where the questions are quite simple and fact based. Other classifications include: list type,
hypothetical type, confirmation, causal questions. In implementation, the merits of factoid type
questions as used in classifying QASs include the fact that for many of the factoid type
questions, there are labeled characteristics through which documents can be extracted using
named entity tagging systems. As such, it is possible to obtain good accuracy.
The main demerit of factoid questions is that the, “…identification of factoid type questions and
their further sub classification automatically is itself a research issue in QASs” (Cui, et al.,
2011).
Use of QASs in business intelligence
Question-answering systems have an extensive application in the field of business intelligence.
For instance, data warehousing forms a core of BI applications. However, the vast majority of

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

11
Classification of QASs: A Survey
the data repositories held by data warehouse often require suitable methods with which to extract
information. Amongst business and academic practitioners, it is an unwritten objective that
businesses should not rely only on internal data but also integrate external sources into their data
collection systems. One means with which to foresee this is through integrating Question-
answering systems into the BI operation to aid in the process of data warehousing. According to
it is possible to achieve The seamless integration by the presenting data which is instantiated
Data warehouses and Question answering systems in the business dashboards which will enable
users to handle both types of data. In addition, “…the QA results are stored in a persistent way
through a new DW repository in order to facilitate comparison of the obtained results with
different questions or even the same question with different dates” (Ferrández, et al., 2014).
Conclusion
Transitioning from traditional based search engine dependency as a means of retrieving
information from the web any other data repository to other methods such as the one explored in
this paper i.e. question answering systems is quickly gaining momentum with a wide range of
applications to that effect. The question however remains on the aspect of accuracy and
relevancy of the results. It is therefore crucial that before adoption of any particular system, the
related business/ entity has taken into consideration all the factors and aspects that are likely to
influence the usability of the system in their programs.
For instance, a business dealing with pharmaceuticals might have a different QAS in comparison
to an automobile related business.
Despite the different applications of the methods explored, it is evident that each application of
QASs which in this paper’s context were explored using the classification perspective have their
own strengths and weaknesses. These observations form the foundations of any future work in

12
Classification of QASs: A Survey
that, we suggest that a research should be conducted to determine which QAS is suitable in
which scenario.
Limitations
The limitations of this study lie in the time and resource constraints which prompted us not to
explore the full extent of the concept of question answering systems which in its own respect is
an interesting concept especially to the field of business intelligence in relation to researches
involving data management.