Analysis of Inverted Index, Boolean and Vector Queries in AI

Verified

Added on  2020/05/16

|12
|1059
|68
Homework Assignment
AI Summary
This assignment explores core concepts in information retrieval and search engine technology. It begins by constructing an inverted index for a set of documents, detailing the steps of tokenization, stop word removal, stemming, and merging terms with document frequencies. The solution then applies Boolean and vector queries, specifically cosine similarity, to retrieve relevant documents based on search terms. A comparative analysis is performed between Boolean and vector models, highlighting the vector model's accuracy. Furthermore, the assignment evaluates two search engines, Google and Bing, using designed queries to find a MongoDB manual, comparing their performance based on precision and recall, and concluding that Google is superior. The document includes a bibliography referencing key information retrieval sources.
Document Page
COVER PAGE
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Contents
Question 1...................................................................................................................................................3
1 creating inverted index.........................................................................................................................3
2 Boolean and vector queries..................................................................................................................9
Question 2.................................................................................................................................................10
Bibliography...............................................................................................................................................12
Document Page
Question 1
1 creating inverted index
The three documents are;
Computer vision
Computer vision is concerned with the automatic extraction, analysis and understanding of useful
information from a single image or a sequence of images
Security and privacy
Security and Privacy has been the premier forum for presenting developments in computer security and
electronic privacy, and for bringing together researchers and practitioners in the field.
Search Engines
Using search engines effectively is now a key skill for researchers, but could more be done to equip
young researchers with the tools they need
a. Stop Words
After removing the stop words the new documents become;
Computer vision
Computer vision concerned automatic extraction analysis understanding useful information
single image sequence images
Security and privacy
Security Privacy premier forum presenting developments computer security electronic privacy
bringing together researchers practitioners field
Search engines
Using search engines effectively key skill researchers equip young researchers tools
After applying Porter Stemming algorithm it becomes
Computer vision (Doc1)
Comput vision concern automat extract analysi understand us inform singl imag sequenc imag
Security and privacy (Doc2)
Secur Privaci premier forum present develop comput secur electron privaci bring togeth
research practition field
Search engines (Doc3)
Using search engin effect kei skill research equip young research tool
Document Page
b. Create a merged inverted list including within-document frequencies for each term
Doc 1
Compute vision concern automat extract analyse understand us inform single image sequence
image
Doc 2
Secure Privacy premier forum present develop compute secure electron privacy bring together
research practition field
Doc 3
use search engine effect key skill research
equip young research tool
Step 1: List normalized tokens for each document
Step 2: Sort the terms alphabetically
Term Doc ID
compute 1
vision 1
concern 1
automat 1
extract 1
analyse 1
understand 1
us 1
inform 1
single 1
image 1
sequence 1
image 1
secure 2
privacy 2
premier 2
forum 2
present 2
develop 2
compute 2
secure 2
electron 2
privacy 2
bring 2
together 2
research 2
praction 2
field 2
use 3
search 3
engine 3
effect 3
key 3
research 3
equip 3
young 3
research 3
tool 3
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Term DOC ID
analyse 1
automat 1
bring 2
compute 1
compute 2
concern 1
develop 2
effect 3
electron 2
engine 3
equip 3
extract 1
field 2
forum 2
image 1
image 1
inform 1
key 3
praction 2
premier 2
present 2
privacy 2
privacy 2
research 2
research 3
research 3
search 3
secure 2
secure 2
sequence 1
single 1
together 2
tool 3
understand 1
us 1
use 3
vision 1
young 3
Step 3: Merge multiple occurrences of the same term
Term DOC ID Term freq
Document Page
analyse 1 1
automat 1 1
bring 2 1
compute 1 1
compute 2 1
concern 1 1
develop 2 1
effect 3 1
electron 2 1
engine 3 1
equip 3 1
extract 1 1
field 2 1
forum 2 1
image 1 2
inform 1 1
key 3 1
praction 2 1
premier 2 1
present 2 1
privacy 2 2
research 2 1
research 3 2
search 3 1
secure 2 2
sequence 1 1
single 1 1
together 2 1
tool 3 1
understand 1 1
us 1 1
use 3 1
vision 1 1
young 3 1
c. Create dictionary and related posting file
Document Page
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
d. Testing
The three selected keywords from the document are; compute, research, privacy
After doing a search using google search the top most documents were the three documents.
2 Boolean and vector queries
a. Boolean queries
i) (analyse Ʌ automate Ʌ ¬forum)
This Boolean query will return document 1
ii) ((research V resaerch) Ʌ privacy Ʌ ¬image)
This Boolean query will return document 2 and document 3
iii) ((research V resaerch) Ʌ (compute V computer) Ʌ (develop V developer))
This Boolean query will return all the three documents i.e. document 1, document 2 and
document 3
b. Vector space model using cosine similarity
Considering three dimensions compute, develop, develop for document 1 (D1)
Given the query compute,develop,research,research
i.e. for D1
D=”compute, develop, develop” = <1,2,0>
Q=”compute, develop, research, research” = <1,1,2>
σ ( D1 , Q)= 1 x 1+2 x 1+0 x 2
12 +22 +02 12 +12 +22 = 3
5 6 = 0.55
For D2
D=”compute, develop, develop” = <1,2,1>
Q=”compute, develop, research, research” = <1,1,2>
σ ( D2 , Q)= 1 x 1+2 x 1+1 x 2
12 +22 +12 12 +12+ 22 = 4
6 6 = 0.67
For D3
D=”compute, develop, develop” = <0,0,1>
Q=”compute, develop, research, research” = <1,1,2>
σ ( D3 , Q)= 0 x 1+0 x 1+1 x 2
02+ 02 +12 12 +12+ 22 = 1
1 6 = 0.41
Thus document 2 (D2) has the highest similarity with the query.
Comparing the Boolean model and the vector model, the vector model is more accurate as
it gives the specific order in which the documents will appear when the query is ran.
Document Page
Question 2
a. Target and designed queries
My two search engines are Google and Bing
My Target is target 9; Obtain the manual of MongoDB
Designed search queries
Query 1= (manual V handbook) Ʌ (MongoDB
Query 2= MongoDB Ʌ manual
The following chart shows the results for Google search engine
Figure 1: Google Search Engine
Document Page
b. Bing search engine
Figure 2: Bing search engine
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
c. Average for Google and Bing
Figure 3:Comparison by average
Evaluation
Google search engine is superior to Bing search engine. This is because analysis of the average graph
shown in figure 3 shows that Google has a higher precision versus recall as compared to the Bing search
engine. This males Google more accurate as it is more precise and has a higher recall value as compared
to Bing search engine.
Bibliography
Bondi, L. Inverted Indexing. Information retrieval, 1–25. Retrieved from
http://home.deib.polimi.it/lbondi/data/uploads/irdm15-16/slides/07_inverted_indexing_v1.pdf
Manning, C. D., Raghavan, P., & Schutze, H. (2008). Introduction to Information Retrieval. Cambridge
University.
Document Page
chevron_up_icon
1 out of 12
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]