Information Retrieval Techniques and Evaluation
VerifiedAdded on 2020/05/16
|11
|1099
|288
AI Summary
This assignment delves into the world of information retrieval, examining both Boolean and Vector models used in document search. It compares the performance of Google and Yahoo search engines by evaluating their precision and recall for specific queries related to obtaining a unit guide. The analysis provides insights into the effectiveness of these search engines and highlights the factors influencing search result quality.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
COVER PAGE
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Contents
COVER PAGE................................................................................................................................................1
Question 1...................................................................................................................................................3
1 creating inverted index.........................................................................................................................3
Inverted index.........................................................................................................................................4
2 Boolean and vector queries..................................................................................................................8
Question 2...................................................................................................................................................9
Bibliography...............................................................................................................................................10
COVER PAGE................................................................................................................................................1
Question 1...................................................................................................................................................3
1 creating inverted index.........................................................................................................................3
Inverted index.........................................................................................................................................4
2 Boolean and vector queries..................................................................................................................8
Question 2...................................................................................................................................................9
Bibliography...............................................................................................................................................10
Question 1
1 creating inverted index
The three documents are;
Science (DOC 1)
Science is a systematic enterprise that builds and organizes knowledge in the form of testable
explanations and predictions about the universe
Computer vision (DOC 2)
Computer vision is a field of computer science that works on enabling computers to see, identify and
process images in the same way that human vision does.
Artificial intelligence (DOC 3)
Artificial Intelligence is a field that has a long history but is still constantly and actively growing and
changing
a. Elimination of stop words
Science (DOC 1)
Science systematic enterprise builds organizes knowledge testable explanations predictions universe
Computer vision (DOC 2)
Computer vision field computer science works enabling computers, identify process images human
vision
Artificial intelligence (DOC 3)
Artificial Intelligence field long history constantly actively growing changing
After applying Porter Stemming algorithm it becomes
Computer vision (Doc1)
Science system enterprise build organize knowledge test explain predict universe
Computer Vision (Doc2)
Compute vision field compute science work enable compute identify process image human vision
Artificial Intelligence (Doc3)
Artificial Intelligence field long history constant active grow change
1 creating inverted index
The three documents are;
Science (DOC 1)
Science is a systematic enterprise that builds and organizes knowledge in the form of testable
explanations and predictions about the universe
Computer vision (DOC 2)
Computer vision is a field of computer science that works on enabling computers to see, identify and
process images in the same way that human vision does.
Artificial intelligence (DOC 3)
Artificial Intelligence is a field that has a long history but is still constantly and actively growing and
changing
a. Elimination of stop words
Science (DOC 1)
Science systematic enterprise builds organizes knowledge testable explanations predictions universe
Computer vision (DOC 2)
Computer vision field computer science works enabling computers, identify process images human
vision
Artificial intelligence (DOC 3)
Artificial Intelligence field long history constantly actively growing changing
After applying Porter Stemming algorithm it becomes
Computer vision (Doc1)
Science system enterprise build organize knowledge test explain predict universe
Computer Vision (Doc2)
Compute vision field compute science work enable compute identify process image human vision
Artificial Intelligence (Doc3)
Artificial Intelligence field long history constant active grow change
Inverted index
Step 1: List normalized tokens for each document
Term Doc ID
Science 1
System 1
Enterprise 1
Build 1
Organize 1
Knowledge 1
Test 1
Explain 1
Predict 1
universe 1
Compute 2
Vision 2
Field 2
Compute 2
Science 2
Work 2
Enable 2
Compute 2
Identify 2
Process 2
Image 2
Human 2
vision 2
Artificial 3
Intelligence 3
Field 3
Long 3
History 3
Constant 3
Active 3
Grow 3
change 3
Step 1: List normalized tokens for each document
Term Doc ID
Science 1
System 1
Enterprise 1
Build 1
Organize 1
Knowledge 1
Test 1
Explain 1
Predict 1
universe 1
Compute 2
Vision 2
Field 2
Compute 2
Science 2
Work 2
Enable 2
Compute 2
Identify 2
Process 2
Image 2
Human 2
vision 2
Artificial 3
Intelligence 3
Field 3
Long 3
History 3
Constant 3
Active 3
Grow 3
change 3
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Step 2: Sort the terms alphabetically
Term Doc ID
Active 3
Artificial 3
Build 1
change 3
Compute 2
Compute 2
Compute 2
Constant 3
Enable 2
Enterprise 1
Explain 1
Field 2
Field 3
Grow 3
History 3
Human 2
Identify 2
Image 2
Intelligence 3
Knowledge 1
Long 3
Organize 1
Predict 1
Process 2
Science 1
Science 2
System 1
Test 1
universe 1
Vision 2
vision 2
Work 2
Term Doc ID
Active 3
Artificial 3
Build 1
change 3
Compute 2
Compute 2
Compute 2
Constant 3
Enable 2
Enterprise 1
Explain 1
Field 2
Field 3
Grow 3
History 3
Human 2
Identify 2
Image 2
Intelligence 3
Knowledge 1
Long 3
Organize 1
Predict 1
Process 2
Science 1
Science 2
System 1
Test 1
universe 1
Vision 2
vision 2
Work 2
Step 3: Merge multiple occurrences of the same term
Term Freq Doc ID
Active 1 3
Artificial 1 3
Build 1 1
change 1 3
Compute 3 2
Constant 1 3
Enable 1 2
Enterprise 1 1
Explain 1 1
Field 1 2
Field 1 3
Grow 1 3
History 1 3
Human 1 2
Identify 1 2
Image 1 2
Intelligence 1 3
Knowledge 1 1
Long 1 3
Organize 1 1
Predict 1 1
Process 1 2
Science 1 1
Science 1 2
System 1 1
Test 1 1
universe 1 1
Vision 2 2
Work 1 2
a. Create dictionary and related posting file
Term Freq Doc ID
Active 1 3
Artificial 1 3
Build 1 1
change 1 3
Compute 3 2
Constant 1 3
Enable 1 2
Enterprise 1 1
Explain 1 1
Field 1 2
Field 1 3
Grow 1 3
History 1 3
Human 1 2
Identify 1 2
Image 1 2
Intelligence 1 3
Knowledge 1 1
Long 1 3
Organize 1 1
Predict 1 1
Process 1 2
Science 1 1
Science 1 2
System 1 1
Test 1 1
universe 1 1
Vision 2 2
Work 1 2
a. Create dictionary and related posting file
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
b. Testing
After testing the some keywords in the inverted index using Google search engine, the results
returned were precise and related to the three documents
2 Boolean and vector queries
a. Boolean queries
i) (field Ʌ artificial Ʌ ¬build)
This query returns document 2 and 3
ii) ((field V feild) Ʌ science )
This query return doc1, doc2 and doc3
iii) ((intelligence V inteligence) Ʌ (science V sceince) Ʌ (field))
This query return doc1, doc2 and doc3
b. Vector model using cosine similarity
Given the query (science, field, intelligence, intelligence)
Document one is (science, field, field)
This results to three dimensions (science, field, intelligence)
i.e. for D1
D=”science, field, field” = <1,0,0>
Q=”science, field, intelligence, intelligence” = <1,1,2>
σ ( D1 , Q)= 1 x 1+ 0 x 1+ 0 x 2
√12 +02 +02 √12+12 +22 = 1
√1 √6 = 0.41
For D2
D=”science, field, field” = <1,2,0>
Q=”science, field, intelligence, intelligence” = <1,1,2>
σ ( D1 , Q)= 1 x 1+2 x 1+0 x 2
√12 +22 +02 √12 +12 +22 = 3
√ 5 √ 6 = 0.55
For D3
D=” science, field, field” = <0,2,1>
Q=” science, field, intelligence, intelligence” = <1,1,2>
σ ( D3 , Q)= 0 x 1+2 x 1+1 x 2
√02+ 22+12 √12+12 +22 = 3
√ 5 √ 6 = 0.55
According to the similarity index of each document with the query DOC2 and DOC3 have the
same similarity index while DOC1 has the lowest similarity index. This would mean that
DOC2 and DOC3 would appear as the top search results followed by D1.
The difference between Boolean model and vector model is that the Boolean shows which
document will appear in the search results but do not show the order in which the
documents will appear in the search results. The vector model shows the order in which the
documents will appear when the search query is ran depending on the similarity index of
the document with the query.
After testing the some keywords in the inverted index using Google search engine, the results
returned were precise and related to the three documents
2 Boolean and vector queries
a. Boolean queries
i) (field Ʌ artificial Ʌ ¬build)
This query returns document 2 and 3
ii) ((field V feild) Ʌ science )
This query return doc1, doc2 and doc3
iii) ((intelligence V inteligence) Ʌ (science V sceince) Ʌ (field))
This query return doc1, doc2 and doc3
b. Vector model using cosine similarity
Given the query (science, field, intelligence, intelligence)
Document one is (science, field, field)
This results to three dimensions (science, field, intelligence)
i.e. for D1
D=”science, field, field” = <1,0,0>
Q=”science, field, intelligence, intelligence” = <1,1,2>
σ ( D1 , Q)= 1 x 1+ 0 x 1+ 0 x 2
√12 +02 +02 √12+12 +22 = 1
√1 √6 = 0.41
For D2
D=”science, field, field” = <1,2,0>
Q=”science, field, intelligence, intelligence” = <1,1,2>
σ ( D1 , Q)= 1 x 1+2 x 1+0 x 2
√12 +22 +02 √12 +12 +22 = 3
√ 5 √ 6 = 0.55
For D3
D=” science, field, field” = <0,2,1>
Q=” science, field, intelligence, intelligence” = <1,1,2>
σ ( D3 , Q)= 0 x 1+2 x 1+1 x 2
√02+ 22+12 √12+12 +22 = 3
√ 5 √ 6 = 0.55
According to the similarity index of each document with the query DOC2 and DOC3 have the
same similarity index while DOC1 has the lowest similarity index. This would mean that
DOC2 and DOC3 would appear as the top search results followed by D1.
The difference between Boolean model and vector model is that the Boolean shows which
document will appear in the search results but do not show the order in which the
documents will appear in the search results. The vector model shows the order in which the
documents will appear when the search query is ran depending on the similarity index of
the document with the query.
Question 2
a. Target and designed queries
My two search engines are Google and Yahoo
My Target is target 3; Obtain the unit guide of SIT773
Designed search queries
Query 1= SIT773 unit guide
Query 2= SIT773 (unit,course) guide
If these queries are expressed to Boolean queries they become
Query 1= (SIT773 Ʌ unit Ʌ guide)
Query 2= (SIT773 Ʌ (unit Ʌ course) Ʌ guide)
Google search engine precision vs recall for query 1, query 2 and the average
Figure 1: Google Search Engine
a. Target and designed queries
My two search engines are Google and Yahoo
My Target is target 3; Obtain the unit guide of SIT773
Designed search queries
Query 1= SIT773 unit guide
Query 2= SIT773 (unit,course) guide
If these queries are expressed to Boolean queries they become
Query 1= (SIT773 Ʌ unit Ʌ guide)
Query 2= (SIT773 Ʌ (unit Ʌ course) Ʌ guide)
Google search engine precision vs recall for query 1, query 2 and the average
Figure 1: Google Search Engine
b. Bing search engine
Figure 2: Yahoo search engine
c. Average for Google and Yahoo
Figure 3:Comparison by average
Evaluation
According to the chat shown in figure 3, Google is more superior to yahoo. This is visible from the graph
as google is more precise than yahoo and it has higher recall value as compared to yahoo. Google’s
fraction of relevant results among retrieved results is higher than that of Yahoo. This is also applies to
the recall value where the fraction of the relevant results or documents that have been retrieved over
the total number relevant results is high as compared to Yahoo.
Bibliography
1&1, 2017. Information Retrieval: the Great Search for Knowledge. 1&1 Digital guide. Available at:
https://www.1and1.com/digitalguide/online-marketing/search-engine-marketing/information-retrieval-
how-search-engines-retrieve-data/ [Accessed February 2, 2018].
Figure 2: Yahoo search engine
c. Average for Google and Yahoo
Figure 3:Comparison by average
Evaluation
According to the chat shown in figure 3, Google is more superior to yahoo. This is visible from the graph
as google is more precise than yahoo and it has higher recall value as compared to yahoo. Google’s
fraction of relevant results among retrieved results is higher than that of Yahoo. This is also applies to
the recall value where the fraction of the relevant results or documents that have been retrieved over
the total number relevant results is high as compared to Yahoo.
Bibliography
1&1, 2017. Information Retrieval: the Great Search for Knowledge. 1&1 Digital guide. Available at:
https://www.1and1.com/digitalguide/online-marketing/search-engine-marketing/information-retrieval-
how-search-engines-retrieve-data/ [Accessed February 2, 2018].
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Manning, C. D., Raghavan, P., & Schutze, H. (2008). Introduction to Information Retrieval. Cambridge
University.
University.
1 out of 11
Related Documents
Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
© 2024 | Zucol Services PVT LTD | All rights reserved.