QUT IFN647: Advanced Information Retrieval Problem Solving Task 3

Verified

Added on  2022/10/04

|7
|1260
|11
Homework Assignment
AI Summary
This assignment, focusing on advanced information retrieval, encompasses several key areas within the field of computer science and information technology. The assignment begins by applying Broder's taxonomy to classify web search queries, differentiating between navigational, transactional, and informational search intents. It then delves into natural language processing (NLP), defining six core areas: morphological, lexical, syntactic, semantic, discourse, and pragmatic analysis, and identifying which NLP techniques are applied in algorithms like Brill Tagger, Porter Stemmer, and Wordnet. The role of ontology in information retrieval is explored, contrasting it with traditional thesauri, and discussing its advantages (precision, semantic comprehension) and disadvantages (keyword limitations). Finally, the assignment concludes with a PageRank calculation, computing the PageRank of three documents over three iterations, demonstrating the algorithm's application in ranking web pages based on link structure. The assignment draws upon key academic resources and provides a comprehensive overview of information retrieval techniques.
Document Page
Name
University
Course
Instructor
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Question 1: (4 marks)
a) Using Broder’s taxonomy of web search, identify the correct class for the following web
information needs: Answers with reference to (Broder, 2002)
1. “I want to find the QUT homepage”.
Navigational web search
2. “I want to find the best provider for health insurance for single female, aged 20-40”
Transactional web search
3. “I want to find out information about Queensland’s Old Government House. For
example: When was it built? Who lived there? At which University is it situated in?”
Informational web search
4. “I want to sign up for a cheap music streaming service that has coverage of songs from
the 1980s, 1990s and 2000s”.
Transactional web search
Document Page
Question 2: (3 Marks)
a) Define the following six areas of natural language processing that are commonly associated
with information retrieval. (1 sentence, max. 50 words for each area of NLP) Answered with
reference to (Liddy, 1998) and (Manning, Raghavan, and Schütze, 2008)
1) Morphological
It is an area of natural language processing that deals with the study of morphemes;
prefixes, affixes, and roots, which are structural components that form a word.
2) Lexical
It is an area of natural language processing that deals with the study at the level of
words with respect to their lexical meaning and parts of speech.
3) Syntactic
It is an area of natural language processing that deals with the study of grammatical
structure of words in a sentence; the arrangement of words in a sentence.
4) Semantic
It is an area of natural language processing that deals with the study of the meaning of
words and how they can be combined to make meaningful phrases in a sentence.
5) Discourse
It is an area of natural language processing that deals with the study of interpretation
of structure and meaning conveyed by text larger than a sentence and the effect a
preceding sentence has on the next sentence in terms of interpretation.
6) Pragmatic
It is an area of natural language processing that deals with the study of purposeful use
and comprehension of language depending on the situation at hand.
b) Identify which area or areas of natural language processing are applied in the following
algorithms/techniques. (For each simply state the analysis employed)
1) Brill Tagger
Lexical analysis
2) Porter Stemmer
Morphological analysis
3) Wordnet
Lexical analysis
4) Noun Phrase Shallow Parser/Chunker
Document Page
Morphological and semantic analysis
Question 3: (4 Marks)
The role of ontology in information retrieval
The difference between ontology and a thesaurus which is a traditional mode of information
retrieval is the inclusion of instances and rules. Ontology brings in precision and relevance to
information retrieval (Jain, & Singh, 2013). Advantages of ontology are; accuracy of
comprehension of user needs, realization semantic comprehension of documents, provision of
knowledge-based information retrieval during search, provision of a search process that has a
strong reasoning capability that is dependent on artificial intelligence, and integration of
heterogeneous knowledge (Sánchez et.al; 2012). The disadvantage of ontology is the limited
capability to capture the conceptualizations of user needs through the use of key words (Sy,
2012).
Ontology plays the following roles in information retrieval; expansion of queries from user input
during searches, abstraction of information where concepts can acquire properties from one
another through inheritance, formalization of semantics where concepts get to be interpreted and
relations between concepts gets realized, and natural language understanding where the user
needs are mapped to the information resources (Jain, & Singh, 2013). Through this roles,
ontology improves the efficiency of information retrieval. Searching by meanings in ontology is
designed to solve the limitation of searching by use of key words (Wimalasuriya, & Dou, 2010).
The roles played by ontology to ensure effective and efficient information retrieval and the fact
that it is still up for improvement makes it suitable for IR.
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Question 4: (3 marks)
A collection consists of 3 pages, D1, D2, D3, with links as follows:
D1 -> D2
D2 -> D1
D2 -> D3
D3 -> D1
1) Write the formula for PageRank and explain the terms used in the PageRank formula
PRi+1 P ( i ) =
Pj

PRi(Pj)
C(Pj )
PRi+1 is the page rank in the next iteration
PRi(Pj) is the initial page rank of a web page that points to this page
C(Pj) is the number of outgoing links on the pointing page (Page et.al; 1999).
2) Compute the PageRank of all 3 documents for each of the first 3 iterations
Web Page Iteration 0 Iteration 1 Iteration 2 Iteration 3 Page Rank
D1 1/3 1/2 1/3 5/12 3
D2 1/3 1/3 1/2 1/3 2
D3 1/3 1/6 1/6 1/4 1
D1 D2
D3
Document Page
References
Broder, A. (2002). A taxonomy of web search. SIGIR Forum, 36(2), 3-10. Retrieved from
http://www.cis.upenn.edu/~nenkova/Courses/cis430/p3-broder.pdf
Jain, V., & Singh, M. (2013). Ontology based information retrieval in semantic web: A
survey. International Journal of Information Technology and Computer Science
(IJITCS), 5(10), 62. Retrieved from http://mecs-press.org/ijitcs/ijitcs-v5-n10/IJITCS-V5-
N10-6.pdf
Jain, V., & Singh, M. (2013). Ontology development and query retrieval using protégé
tool. International Journal of Intelligent Systems and Applications (IJISA), 5(9), 67.
Retrieved from
https://www.researchgate.net/profile/Mayank_Singh19/publication/276231453_Ontology
_Development_and_Query_Retrieval_using_Protege_Tool/links/
5775134508ae1b18a7dfbf6c.pdf
Liddy, E. (1998). Enhanced Text Retrieval Using Natural Language Processing. Bulletin of the
American Society For Information Science and Technology, 24(4), 14-16. Retrieved
from https://www.marilia.unesp.br/Home/Instituicao/Docentes/EdbertoFerneda/MRI
%2006%20%20Liddy,%20ED%20-%201998.pdf
Manning, C. D., Raghavan, P., & Schütze, H. (2008). Chapter 8: Evaluation Introduction to
Information Retrieval (pp. 151-175): Cambridge University Press Retrieved from
http://nlp.stanford.edu/IR-book/pdf/08eval.pdf [3]
Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking:
Bringing order to the Web Retrieved from http://ilpubs.stanford.edu:8090/422/
Sánchez, D., Batet, M., Isern, D., & Valls, A. (2012). Ontology-based semantic similarity: A new
feature-based approach. Expert systems with applications, 39(9), 7718-7728. Retrieved
from https://www.sciencedirect.com/science/article/pii/S0957417412000954
Sy, M. F., Ranwez, S., Montmain, J., Regnault, A., Crampes, M., & Ranwez, V. (2012). User
centered and ontology based information retrieval system for life sciences. BMC
Document Page
bioinformatics, 13(1), S4. Retrieved from
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-13-S1-S4
Wimalasuriya, D. C., & Dou, D. (2010). Ontology-based information extraction: An introduction
and a survey of current approaches. Retrieved from
https://journals.sagepub.com/doi/abs/10.1177/0165551509360123
chevron_up_icon
1 out of 7
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]