Information Retrieval Techniques and Evaluation

Verified

Added on  2020/05/16

|11
|1099
|288
AI Summary
This assignment delves into the world of information retrieval, examining both Boolean and Vector models used in document search. It compares the performance of Google and Yahoo search engines by evaluating their precision and recall for specific queries related to obtaining a unit guide. The analysis provides insights into the effectiveness of these search engines and highlights the factors influencing search result quality.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
COVER PAGE

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Contents
COVER PAGE................................................................................................................................................1
Question 1...................................................................................................................................................3
1 creating inverted index.........................................................................................................................3
Inverted index.........................................................................................................................................4
2 Boolean and vector queries..................................................................................................................8
Question 2...................................................................................................................................................9
Bibliography...............................................................................................................................................10
Document Page
Question 1
1 creating inverted index
The three documents are;
Science (DOC 1)
Science is a systematic enterprise that builds and organizes knowledge in the form of testable
explanations and predictions about the universe
Computer vision (DOC 2)
Computer vision is a field of computer science that works on enabling computers to see, identify and
process images in the same way that human vision does.
Artificial intelligence (DOC 3)
Artificial Intelligence is a field that has a long history but is still constantly and actively growing and
changing
a. Elimination of stop words
Science (DOC 1)
Science systematic enterprise builds organizes knowledge testable explanations predictions universe
Computer vision (DOC 2)
Computer vision field computer science works enabling computers, identify process images human
vision
Artificial intelligence (DOC 3)
Artificial Intelligence field long history constantly actively growing changing
After applying Porter Stemming algorithm it becomes
Computer vision (Doc1)
Science system enterprise build organize knowledge test explain predict universe
Computer Vision (Doc2)
Compute vision field compute science work enable compute identify process image human vision
Artificial Intelligence (Doc3)
Artificial Intelligence field long history constant active grow change
Document Page
Inverted index
Step 1: List normalized tokens for each document
Term Doc ID
Science 1
System 1
Enterprise 1
Build 1
Organize 1
Knowledge 1
Test 1
Explain 1
Predict 1
universe 1
Compute 2
Vision 2
Field 2
Compute 2
Science 2
Work 2
Enable 2
Compute 2
Identify 2
Process 2
Image 2
Human 2
vision 2
Artificial 3
Intelligence 3
Field 3
Long 3
History 3
Constant 3
Active 3
Grow 3
change 3

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Step 2: Sort the terms alphabetically
Term Doc ID
Active 3
Artificial 3
Build 1
change 3
Compute 2
Compute 2
Compute 2
Constant 3
Enable 2
Enterprise 1
Explain 1
Field 2
Field 3
Grow 3
History 3
Human 2
Identify 2
Image 2
Intelligence 3
Knowledge 1
Long 3
Organize 1
Predict 1
Process 2
Science 1
Science 2
System 1
Test 1
universe 1
Vision 2
vision 2
Work 2
Document Page
Step 3: Merge multiple occurrences of the same term
Term Freq Doc ID
Active 1 3
Artificial 1 3
Build 1 1
change 1 3
Compute 3 2
Constant 1 3
Enable 1 2
Enterprise 1 1
Explain 1 1
Field 1 2
Field 1 3
Grow 1 3
History 1 3
Human 1 2
Identify 1 2
Image 1 2
Intelligence 1 3
Knowledge 1 1
Long 1 3
Organize 1 1
Predict 1 1
Process 1 2
Science 1 1
Science 1 2
System 1 1
Test 1 1
universe 1 1
Vision 2 2
Work 1 2
a. Create dictionary and related posting file
Document Page

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
b. Testing
After testing the some keywords in the inverted index using Google search engine, the results
returned were precise and related to the three documents
2 Boolean and vector queries
a. Boolean queries
i) (field Ʌ artificial Ʌ ¬build)
This query returns document 2 and 3
ii) ((field V feild) Ʌ science )
This query return doc1, doc2 and doc3
iii) ((intelligence V inteligence) Ʌ (science V sceince) Ʌ (field))
This query return doc1, doc2 and doc3
b. Vector model using cosine similarity
Given the query (science, field, intelligence, intelligence)
Document one is (science, field, field)
This results to three dimensions (science, field, intelligence)
i.e. for D1
D=”science, field, field” = <1,0,0>
Q=”science, field, intelligence, intelligence” = <1,1,2>
σ ( D1 , Q)= 1 x 1+ 0 x 1+ 0 x 2
12 +02 +02 12+12 +22 = 1
1 6 = 0.41
For D2
D=”science, field, field” = <1,2,0>
Q=”science, field, intelligence, intelligence” = <1,1,2>
σ ( D1 , Q)= 1 x 1+2 x 1+0 x 2
12 +22 +02 12 +12 +22 = 3
5 6 = 0.55
For D3
D=” science, field, field” = <0,2,1>
Q=” science, field, intelligence, intelligence” = <1,1,2>
σ ( D3 , Q)= 0 x 1+2 x 1+1 x 2
02+ 22+12 12+12 +22 = 3
5 6 = 0.55
According to the similarity index of each document with the query DOC2 and DOC3 have the
same similarity index while DOC1 has the lowest similarity index. This would mean that
DOC2 and DOC3 would appear as the top search results followed by D1.
The difference between Boolean model and vector model is that the Boolean shows which
document will appear in the search results but do not show the order in which the
documents will appear in the search results. The vector model shows the order in which the
documents will appear when the search query is ran depending on the similarity index of
the document with the query.
Document Page
Question 2
a. Target and designed queries
My two search engines are Google and Yahoo
My Target is target 3; Obtain the unit guide of SIT773
Designed search queries
Query 1= SIT773 unit guide
Query 2= SIT773 (unit,course) guide
If these queries are expressed to Boolean queries they become
Query 1= (SIT773 Ʌ unit Ʌ guide)
Query 2= (SIT773 Ʌ (unit Ʌ course) Ʌ guide)
Google search engine precision vs recall for query 1, query 2 and the average
Figure 1: Google Search Engine
Document Page
b. Bing search engine
Figure 2: Yahoo search engine
c. Average for Google and Yahoo
Figure 3:Comparison by average
Evaluation
According to the chat shown in figure 3, Google is more superior to yahoo. This is visible from the graph
as google is more precise than yahoo and it has higher recall value as compared to yahoo. Google’s
fraction of relevant results among retrieved results is higher than that of Yahoo. This is also applies to
the recall value where the fraction of the relevant results or documents that have been retrieved over
the total number relevant results is high as compared to Yahoo.
Bibliography
1&1, 2017. Information Retrieval: the Great Search for Knowledge. 1&1 Digital guide. Available at:
https://www.1and1.com/digitalguide/online-marketing/search-engine-marketing/information-retrieval-
how-search-engines-retrieve-data/ [Accessed February 2, 2018].

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Manning, C. D., Raghavan, P., & Schutze, H. (2008). Introduction to Information Retrieval. Cambridge
University.
1 out of 11
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]