Inverted Index and Search Engine Queries
VerifiedAdded on 2023/06/15
|11
|1154
|271
AI Summary
This article explains the process of creating an inverted index for three topics, applying Porter Stemming algorithm, and testing using Boolean and vector queries. It also compares the precision and recall of Google and Bing search engines for a specific query.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
COVER PAGE
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Contents
Question 1...................................................................................................................................................3
Inverted index.........................................................................................................................................3
Document for each topic.....................................................................................................................3
a. Stop words removal.........................................................................................................................3
Applying Porter Stemming algorithm..................................................................................................3
Steps followed to create the inverted index............................................................................................4
Step 1: Normalized tokens sorted in alphabetical order......................................................................4
Step 2: Merge terms appearing more than once.................................................................................5
B dictionary and related posting file........................................................................................................6
c. Testing..............................................................................................................................................7
2 Boolean and vector queries..................................................................................................................7
a. Boolean queries...........................................................................................................................7
b. Vector model (cosine similarity)..................................................................................................7
Question 2...................................................................................................................................................8
Search engines.........................................................................................................................................8
My Target................................................................................................................................................8
Search queries.........................................................................................................................................8
Google Search engine..........................................................................................................................9
Bing search engine.............................................................................................................................10
Average for Google and Bing.............................................................................................................11
Bibliography...............................................................................................................................................11
Question 1...................................................................................................................................................3
Inverted index.........................................................................................................................................3
Document for each topic.....................................................................................................................3
a. Stop words removal.........................................................................................................................3
Applying Porter Stemming algorithm..................................................................................................3
Steps followed to create the inverted index............................................................................................4
Step 1: Normalized tokens sorted in alphabetical order......................................................................4
Step 2: Merge terms appearing more than once.................................................................................5
B dictionary and related posting file........................................................................................................6
c. Testing..............................................................................................................................................7
2 Boolean and vector queries..................................................................................................................7
a. Boolean queries...........................................................................................................................7
b. Vector model (cosine similarity)..................................................................................................7
Question 2...................................................................................................................................................8
Search engines.........................................................................................................................................8
My Target................................................................................................................................................8
Search queries.........................................................................................................................................8
Google Search engine..........................................................................................................................9
Bing search engine.............................................................................................................................10
Average for Google and Bing.............................................................................................................11
Bibliography...............................................................................................................................................11
Question 1
Inverted index
The three selected topics are;
Science
Computer vision
Search Engine
Document for each topic
Search engine –DOC1
The Union of Concerned Scientists puts rigorous, independent science to work to solve our
planet's most pressing problems
Computer vision- DOC2
Computer vision systems are implemented in a wide range of industrial and scientific
applications
Search Engine – (DOC3)
Companies that advertise on regular search engines like Google have to pay a lot of money per
click for the keyword that is related to their product or service
Creating the inverted index
a. Stop words removal
Search engine –DOC1
The Union Concerned Scientists puts rigorous, independent science solve planet's pressing
problems
Computer vision- DOC2
Computer vision systems implemented wide range industrial scientific applications
Search Engine – (DOC3)
Companies advertise regular search engines like Google pay money per click keyword related
product service
Applying Porter Stemming algorithm
Search engine –DOC1
Union Concern Science put rigor depend science solve planet press problem
Computer vision- DOC2
Compute vision system implement wide range industry science apply
Search Engine – (DOC3)
Company advertise regular search engine like Google pay money per click keyword relate
product service
Inverted index
The three selected topics are;
Science
Computer vision
Search Engine
Document for each topic
Search engine –DOC1
The Union of Concerned Scientists puts rigorous, independent science to work to solve our
planet's most pressing problems
Computer vision- DOC2
Computer vision systems are implemented in a wide range of industrial and scientific
applications
Search Engine – (DOC3)
Companies that advertise on regular search engines like Google have to pay a lot of money per
click for the keyword that is related to their product or service
Creating the inverted index
a. Stop words removal
Search engine –DOC1
The Union Concerned Scientists puts rigorous, independent science solve planet's pressing
problems
Computer vision- DOC2
Computer vision systems implemented wide range industrial scientific applications
Search Engine – (DOC3)
Companies advertise regular search engines like Google pay money per click keyword related
product service
Applying Porter Stemming algorithm
Search engine –DOC1
Union Concern Science put rigor depend science solve planet press problem
Computer vision- DOC2
Compute vision system implement wide range industry science apply
Search Engine – (DOC3)
Company advertise regular search engine like Google pay money per click keyword relate
product service
Steps followed to create the inverted index
Step 1: Normalized tokens sorted in alphabetical order
Term Doc ID
Advertise 3
apply 2
Click 3
Company 3
Compute 2
concern 1
Depend 1
Engine 3
Google 3
Implement 2
Industry 2
Keyword 3
Like 3
Money 3
Pay 3
Per 3
Planet 1
Press 1
problem 1
Product 3
Put 1
Range 2
Regular 3
Relate 3
Rigor 1
science 1
Science 1
Science 2
Search 3
service 3
Solve 1
System 2
Union 1
Vision 2
Wide 2
Step 2: Merge terms appearing more than once
Term Frequency Doc ID
Step 1: Normalized tokens sorted in alphabetical order
Term Doc ID
Advertise 3
apply 2
Click 3
Company 3
Compute 2
concern 1
Depend 1
Engine 3
Google 3
Implement 2
Industry 2
Keyword 3
Like 3
Money 3
Pay 3
Per 3
Planet 1
Press 1
problem 1
Product 3
Put 1
Range 2
Regular 3
Relate 3
Rigor 1
science 1
Science 1
Science 2
Search 3
service 3
Solve 1
System 2
Union 1
Vision 2
Wide 2
Step 2: Merge terms appearing more than once
Term Frequency Doc ID
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Advertise 1 3
apply 1 2
Click 1 3
Company 1 3
Compute 1 2
concern 1 1
Depend 1 1
Engine 1 3
Google 1 3
Implement 1 2
Industry 1 2
Keyword 1 3
Like 1 3
Money 1 3
Pay 1 3
Per 1 3
Planet 1 1
Press 1 1
problem 1 1
Product 1 3
Put 1 1
Range 1 2
Regular 1 3
Relate 1 3
Rigor 1 1
science 2 1
Science 1 2
Search 1 3
service 1 3
Solve 1 1
System 1 2
Union 1 1
Vision 1 2
Wide 1 2
apply 1 2
Click 1 3
Company 1 3
Compute 1 2
concern 1 1
Depend 1 1
Engine 1 3
Google 1 3
Implement 1 2
Industry 1 2
Keyword 1 3
Like 1 3
Money 1 3
Pay 1 3
Per 1 3
Planet 1 1
Press 1 1
problem 1 1
Product 1 3
Put 1 1
Range 1 2
Regular 1 3
Relate 1 3
Rigor 1 1
science 2 1
Science 1 2
Search 1 3
service 1 3
Solve 1 1
System 1 2
Union 1 1
Vision 1 2
Wide 1 2
B dictionary and related posting file
c. Testing
Testing using the inverted index returned documents that are related to the three documents for each
topic. Testing was done using Google where most of the results were related to the three topics.
2 Boolean and vector queries
a. Boolean queries
Query 1: (Science Ʌ System Ʌ ¬Advertise)
Returns: Doc 1 and Doc 2
Query 2: ((Science V Sceince) Ʌ Advertise)
Returns: DOC1, DOC2 and DOC3
Query 3: (product Ʌ search Ʌ relate Ʌ (¬website V ¬vision))
Returns: DOC3 only.
b. Vector model (cosine similarity)
Query (science, industry, search, search)
For each document (science, industry, industry)
This results to three dimensions (science, industry, search)
Calculation of cosine similarity
For D1
D1=<science, industry, industry> = <1, 0, 0>
Q=<science, industry, search, search> = <1, 1, 2>
σ ( D1 , Q)= 1 x 1+0 x 1+ 0 x 0
√12 +02 +02 √12+12 +22 = 1
√1 √6 = 0.41
Thus D1=0.41
For D2
D=<data, replace, invest> = <1, 1, 0>
Q=<data, replace, invest, invest> = <1, 1, 2>
σ ( D2 , Q)= 1 x 1+1 x 1+0 x 2
√12 +12 +02 √12 +12 +22 = 2
√ 2 √ 6 = 0.58
Thus D2=0.58
Testing using the inverted index returned documents that are related to the three documents for each
topic. Testing was done using Google where most of the results were related to the three topics.
2 Boolean and vector queries
a. Boolean queries
Query 1: (Science Ʌ System Ʌ ¬Advertise)
Returns: Doc 1 and Doc 2
Query 2: ((Science V Sceince) Ʌ Advertise)
Returns: DOC1, DOC2 and DOC3
Query 3: (product Ʌ search Ʌ relate Ʌ (¬website V ¬vision))
Returns: DOC3 only.
b. Vector model (cosine similarity)
Query (science, industry, search, search)
For each document (science, industry, industry)
This results to three dimensions (science, industry, search)
Calculation of cosine similarity
For D1
D1=<science, industry, industry> = <1, 0, 0>
Q=<science, industry, search, search> = <1, 1, 2>
σ ( D1 , Q)= 1 x 1+0 x 1+ 0 x 0
√12 +02 +02 √12+12 +22 = 1
√1 √6 = 0.41
Thus D1=0.41
For D2
D=<data, replace, invest> = <1, 1, 0>
Q=<data, replace, invest, invest> = <1, 1, 2>
σ ( D2 , Q)= 1 x 1+1 x 1+0 x 2
√12 +12 +02 √12 +12 +22 = 2
√ 2 √ 6 = 0.58
Thus D2=0.58
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
For D3
D1=<science, industry, industry> = <1, 0, 0>
Q=<science, industry, search, search> = <1, 1, 2>
σ ( D1 , Q)= 1 x 1+0 x 1+ 0 x 0
√12 +02 +02 √12+12 +22 = 1
√1 √6 = 0.41
Thus D3=0.41
According to the calculation of cosine similarity of each document with the query the order
of the documents if the query is ran will be;
D2D1D3
Vector space model using cosine similarity is a more effective method of showing which
documents are fetched by a search engine as compared to the Boolean model. This is
because vector space model shows the order in which the documents are fetched where by
the document with the highest cosine similarity is fetched first.
Question 2
Search engines
Google
Bing
My Target
Obtain install document of MongDB
Search queries
Query 1= MongoDB install document
Query 2= mongoDB (installation, setup) document
D1=<science, industry, industry> = <1, 0, 0>
Q=<science, industry, search, search> = <1, 1, 2>
σ ( D1 , Q)= 1 x 1+0 x 1+ 0 x 0
√12 +02 +02 √12+12 +22 = 1
√1 √6 = 0.41
Thus D3=0.41
According to the calculation of cosine similarity of each document with the query the order
of the documents if the query is ran will be;
D2D1D3
Vector space model using cosine similarity is a more effective method of showing which
documents are fetched by a search engine as compared to the Boolean model. This is
because vector space model shows the order in which the documents are fetched where by
the document with the highest cosine similarity is fetched first.
Question 2
Search engines
Bing
My Target
Obtain install document of MongDB
Search queries
Query 1= MongoDB install document
Query 2= mongoDB (installation, setup) document
Google Search engine
Figure 1: Google Search Engine
Figure 1: Google Search Engine
Bing search engine
Figure 2: Bing search engine
Figure 2: Bing search engine
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Average for Google and Bing
Figure 3: Comparison by average
Evaluation
Figure 3 shows comparison by average of the two search engines. According to the chart Google beats
Bing search engine by both precision and recall. This means that for both queries Google search engine
is superior that Bing search engine as it is more precise and has a higher recall value when the same
queries are ran on both search engines
Bibliography
Manning, C., Raghavan, P. & Schutze, H., 2008. Introduction to Information Retrieval, Cambridge
University.
Figure 3: Comparison by average
Evaluation
Figure 3 shows comparison by average of the two search engines. According to the chart Google beats
Bing search engine by both precision and recall. This means that for both queries Google search engine
is superior that Bing search engine as it is more precise and has a higher recall value when the same
queries are ran on both search engines
Bibliography
Manning, C., Raghavan, P. & Schutze, H., 2008. Introduction to Information Retrieval, Cambridge
University.
1 out of 11
Related Documents
Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
© 2024 | Zucol Services PVT LTD | All rights reserved.