ProductsLogo
LogoStudy Documents
LogoAI Grader
LogoAI Answer
LogoAI Code Checker
LogoPlagiarism Checker
LogoAI Paraphraser
LogoAI Quiz
LogoAI Detector
PricingBlogAbout Us
logo

(PDF) Inverted indexes: Types and techniques

Verified

Added on  2021/05/31

|13
|1074
|26
AI Summary

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
COVER PAGE (ENTER YOUR DETAILS)

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Contents
COVER PAGE (ENTER YOUR DETAILS)..................................................................................................1
Question 1...................................................................................................................................................3
Creating an inverted index......................................................................................................................3
Stop words removal.............................................................................................................................3
Applying Porter Stemming algorithm..................................................................................................3
Normalized tokens for each document...............................................................................................4
Merged normalized tokens for the three documents..............................................................................5
Sort the normalized tokens in alphabetical order................................................................................6
Create frequencies for tokens per document......................................................................................6
Dictionary and related posting file.......................................................................................................8
Testing.................................................................................................................................................9
Boolean and vector queries.....................................................................................................................9
Question 2 IR evaluation...........................................................................................................................10
Search engines.......................................................................................................................................10
Targets...................................................................................................................................................10
Search queries.......................................................................................................................................10
Google search Engine........................................................................................................................11
Bing Search Engine............................................................................................................................12
Average for Google and Bing.............................................................................................................13
Bibliography...............................................................................................................................................13
Document Page
Question 1
Creating an inverted index
The following documents are used to create the inverted index
Document 1 (D1)
Information retrieval is the activity of obtaining information resources relevant to an
information need from a collection of information resources. Searches can be based on full-text
or other content-based indexing.
Document 2 (D2)
Information retrieval is finding material of an unstructured nature that satisfies an information
need from within large collections
Document 3 (D3)
Information systems is the study of complementary networks of hardware and software that
people and organizations use to collect, filter, process, create, and distribute data.
To create the inverted index, the following steps are followed;
Removing stop words
Applying porters algorithm
Create normalized tokens for all documents
Merge the tokens into one list and arrange them in alphabetical order
Add frequencies of each token.
Stop words removal
After removing the stop words the new documents become;
D1
Information retrieval activity obtaining information resources relevant information collection
information resources Searches based full-text content-based indexing
D2
Information retrieval finding material unstructured nature satisfies information within large
collections
D3
Information systems study complementary networks hardware software people organizations
collect filter process create distribute data
Applying Porter Stemming algorithm
After applying porter stemming algorithm;
D1
Information retrieve active obtain inform resource relevant information collect information
resource Search base full text content base index
D2
Information retrieve find material unstructure nature satisfy information within large collect
D3
Document Page
Information system study complement network hardware software people organ collect filter
process create distribute data
Normalized tokens for each document
D1
Token Document ID
Information 1
Retrieve 1
Active 1
Obtain 1
Inform 1
Resource 1
Relevant 1
Information 1
Resource 1
Search 1
Base 1
Full 1
Text 1
Content 1
Base 1
index 1
D2
Token Document ID
Information 2
Retrieve 2
Find 2
Material 2
Unstructure 2
Nature 2
Satisfy 2
Information 2
Within 2
Large 2
collect 2

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Merged normalized tokens for the three documents
Term Doc ID
Information 1
Retrieve 1
Active 1
Obtain 1
Inform 1
Resource 1
Relevant 1
Information 1
Resource 1
Search 1
Base 1
Full 1
Text 1
Content 1
Base 1
index 1
Information 2
Retrieve 2
Find 2
Material 2
Unstructure 2
Nature 2
Satisfy 2
Information 2
Within 2
Large 2
collect 2
Information 3
System 3
Study 3
Complement 3
Network 3
Hardware 3
Software 3
People 3
Organ 3
collect 3
Filter 3
Process 3
create 3
Distribute 3
data 3
Document Page
Sort the normalized tokens in alphabetical order
Term Doc ID
Active 1
Base 1
Base 1
collect 2
Complemen
t
3
Content 1
Find 2
Full 1
Hardware 3
index 1
Information 1
Information 2
Information 1
Information 1
Information 2
Information 3
Large 2
Material 2
Nature 2
Network 3
Obtain 1
People 3
Relevant 1
Resource 1
Resource 1
Retrieve 1
Retrieve 2
Satisfy 2
Search 1
Software 3
Study 3
System 3
Text 1
Unstructure 2
Create frequencies for tokens per document
Term Frequency Doc ID
Active 1 1
Document Page
Base 2 1
collect 1 2
Complemen
t
1 3
Content 1 1
Find 1 2
Full 1 1
Hardware 1 3
index 1 1
Information 3 1
Information 2 2
Information 1 3
Large 1 2
Material 1 2
Nature 1 2
Network 1 3
Obtain 1 1
People 1 3
Relevant 1 1
Resource 2 1
Retrieve 1 1
Retrieve 1 2
Satisfy 1 2
Search 1 1
Software 1 3
Study 1 3
System 1 3
Text 1 1
Unstructure 1 2

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Dictionary and related posting file
Testing
Testing of the inverted index using information, system and index keywords was done using Google. The
results w
Document Page
Boolean and vector queries
a. Boolean queries
1) (information AND system AND index)
Result: D1, D2 and DOC3
2) (System AND index)
Result: D1, D3
3) (System AND NOT Index)
Result: D3
b. Vector model using cosine similarity
Query Q= (Information, system, index)
Cosine similarity;
Document 1
D1=<information, information, information, system, index> = <3, 1, 0>
Q=< information, system, index> = <1, 1, 1>
σ ( D1 , Q)= 3 x 1+ 1 x 1+ 0 x 1
32 +12 +02 12+12 +12 = 4
7 3 = 1.15
For D2
D2=<information, information> = <2, 0, 0>
Q=<information, information, system, index> = <1, 1, 1>
σ ( D2 , Q)= 2 x 1+ 0 x 1+0 x 1
22 +02 +02 12+ 12+12 = 2
4 3 = 0.76
For D3
D=<information, system>= <1, 1, 0>
Q=<information, information, system, index> <1, 1, 1>
σ ( D3 , Q)= 1 x 1+ 1 x 1+ 0 x 1
12 +12 +02 12+12 +12 = 2
2 3 = 1.07
According to the results of each document the order in which the documents appear in
search results is
Document Page
D1D2D3
Question 2 IR evaluation
Search engines
Two of the top search engines to perform IR evaluation. The search engines are;
Google
Bing
Targets
Target 2: obtain the price of the new Samsung Tablet.
Target 3: obtain the manual of installing tera term
Search queries
Query 1= Samsung tablet price
Query 2= tera term installation manual

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Google search Engine
Figure 1: Google Search Engine
Document Page
Bing Search Engine
Figure 2: Yahoo search engine
Document Page
Average for Google and Bing
Figure 3: Comparison by average
According to figure 3 above which shows the average precision and recall for Google and Bing, Google is
has a higher recall value and is more precise then Bing as shown by the graph.
Bibliography
Manning, C. D., Raghavan, P., & Schutze, H. (2008). Introduction to Information Retrieval. Cambridge
University.
1 out of 13
[object Object]

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]