Inverted Index and IR Evaluation for Desklib Online Library

Verified

Added on  2023/06/12

|11
|1174
|150
AI Summary
This article discusses the creation of an inverted index for Desklib online library, including stop words removal and Porter Stemming algorithm. It also covers IR evaluation using Google and Ask.com search engines, with target queries and search queries provided. The article includes step-by-step instructions and relevant screenshots.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
COVER PAGE
DETAILS

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Contents
COVER PAGE........................................................................................................................................1
DETAILS................................................................................................................................................1
Question 1...................................................................................................................................................3
Inverted index.........................................................................................................................................3
Document for each topic.....................................................................................................................3
a. Stop words removal.....................................................................................................................3
Applying Porter Stemming algorithm..................................................................................................3
Steps followed to create the inverted index............................................................................................4
Step 1: create normalized tokens from every document.....................................................................4
Step 2: Normalized tokens sorted in alphabetical order......................................................................5
b. Step 3: Merge terms appearing more than once.........................................................................5
c. dictionary and related posting file...............................................................................................7
d. Testing.........................................................................................................................................8
Boolean and vector queries.....................................................................................................................8
Question 2 IR evaluation.............................................................................................................................9
a. Target and designed queries...........................................................................................................9
Search engines.....................................................................................................................................9
Target..................................................................................................................................................9
Search queries.....................................................................................................................................9
b. List your target, results and designed search queries..........................................................................9
A. Google Search engine......................................................................................................................9
B. Ask.com.........................................................................................................................................10
C. Average comparison......................................................................................................................11
Document Page
Question 1
Inverted index
Document for each topic
DOC1
Information retrieval is the activity of obtaining information resources relevant to an
information need from a collection of information resources. Searches can be based on full-text
or other content-based indexing.
DOC2
Information retrieval is finding material of an unstructured nature that satisfies an information
need from within large collections
DOC3
Information systems is the study of complementary networks of hardware and software that
people and organizations use to collect, filter, process, create, and distribute data.
The following steps are followed to create an inverted index
a. Stop words removal
After removing the stop words the new documents become;
DOC1
Information retrieval activity obtaining information resources relevant information collection
information resources Searches based full-text content-based indexing
DOC2
Information retrieval finding material unstructured nature satisfies information large collections
DOC3
Information systems study complementary networks hardware software people organizations
collect filter process create distribute data
Applying Porter Stemming algorithm
The next step is applying porter stemming algorithm and as a result the new documents become;
DOC1
Information retrieve active obtain inform resource relevant information collect information
resource Search base full text content base index
DOC2
Information retrieve find material unstructure nature satisfy information large collect
DOC3
Information system study complement network hardware software people organ collect filter
process create distribute data
Document Page
Steps followed to create the inverted index
Step 1: create normalized tokens from every document
Term Doc ID
Information 1
Retrieve 1
Active 1
Obtain 1
Inform 1
Resource 1
Relevant 1
Information 1
Resource 1
Search 1
Base 1
Full 1
Text 1
Content 1
Base 1
index 1
Information 2
Retrieve 2
Find 2
Material 2
Unstructure 2
Nature 2
Satisfy 2
Information 2
Large 2
collect 2
Information 3
System 3
Study 3
Complement 3
Network 3
Hardware 3
Software 3
People 3
Organ 3
collect 3
Filter 3
Process 3
create 3
Distribute 3
data 3

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Step 2: Normalized tokens sorted in alphabetical order
Term Doc ID
Active 1
Base 1
Base 1
collect 2
Complemen
t
3
Content 1
Find 2
Full 1
Hardware 3
index 1
Information 1
Information 2
Information 1
Information 1
Information 2
Information 3
Large 2
Material 2
Nature 2
Network 3
Obtain 1
People 3
Relevant 1
Resource 1
Resource 1
Retrieve 1
Retrieve 2
Satisfy 2
Search 1
Software 3
Study 3
System 3
Text 1
Unstructure 2
b. Step 3: Merge terms appearing more than once
Term Frequenc
y
Doc ID
Document Page
Active 1 1
Base 2 1
collect 1 2
Complement 1 3
Content 1 1
Find 1 2
Full 1 1
Hardware 1 3
index 1 1
Information 3 1
Information 2 2
Information 1 3
Large 1 2
Material 1 2
Nature 1 2
Network 1 3
Obtain 1 1
People 1 3
Relevant 1 1
Resource 2 1
Retrieve 1 1
Retrieve 1 2
Satisfy 1 2
Search 1 1
Software 1 3
Study 1 3
System 1 3
Text 1 1
Unstructure 1 2
Document Page
c. dictionary and related posting file

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
d. Testing
By using google to test the inverted index the results were returned by the search engines were related.
Boolean and vector queries
e. Boolean queries
i) (information AND system AND index)
Result: D1, D2 and DOC3
ii) (System AND index)
Result: D1, D3
iii) (System AND NOT Index)
Result: D3
f. Vector model using cosine similarity
Query Q= (Information, system, index)
Cosine similarity;
Document 1
D1=<information, information, information, system, index> = <3, 1, 0>
Q=< information, system, index> = <1, 1, 1>
σ ( D1 , Q)= 3 x 1+ 1 x 1+ 0 x 1
32 +12 +02 12+12 +12 = 4
7 3 = 1.15
Document 12
D2=<information, information> = <2, 0, 0>
Q=<information, information, system, index> = <1, 1, 1>
σ ( D2 , Q)= 2 x 1+ 0 x 1+0 x 1
22 +02 +02 12+ 12+12 = 2
4 3 = 0.76
Document 13
D=<information, system>= <1, 1, 0>
Q=<information, information, system, index> <1, 1, 1>
σ ( D3 , Q)= 1 x 1+ 1 x 1+ 0 x 1
12 +12 +02 12+12 +12 = 2
2 3 = 1.07
When the query is searched the documents will appear in the following order
Document Page
1. Document 1
2. Document 2
3. Document 3
Question 2 IR evaluation
a. Target and designed queries
Search engines
Google
Ask.com
Target
Target 4: obtain the oracle SQL tutorial.
Search queries
Query 1= oracle SQL manual
Query 2= Oracle SQL tutorial
b. List your target, results and designed search queries
A. Google Search engine
Figure 1: Google Search Engine
Key
Green ------ = precision
White ------ = recall
Document Page
B. Ask.com
Figure 2: Ask.com search engine
Key
Green ------ = precision
White ------ = recall

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
C. Average comparison
Figure 3: average comparison
Key
Green ------ = precision
White ------ = recall
Based on the graph show in figure 3 above Google is more superior to Ask. Com because based on the
two queries Google demonstrates higher precision and recall than ask.com thus this makes Google more
superior to ask as the user is able to get more results related to what he or she is searching for.
1 out of 11
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]