Creating an Inverted Index and IR Evaluation for Desklib

Verified

Added on  2023/06/12

|12
|1121
|120
AI Summary
This article explains how to create an inverted index for Desklib, including removing stop words and using the Porter Stemming algorithm. It also covers sorting words in alphabetical order, word frequency per document, and testing with Boolean and vector queries. Additionally, it evaluates IR for Desklib using Google and Bing search engines, targets, and search queries.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
COVER PAGE (ENTER YOUR DETAILS)

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Contents
COVER PAGE (ENTER YOUR DETAILS)..................................................................................................1
Question 1...................................................................................................................................................3
Creating an inverted index......................................................................................................................3
Remove stop words.............................................................................................................................3
Porter Stemming algorithm.................................................................................................................3
The three documents merged.................................................................................................................4
Sorting words in alphabetical order for all documents........................................................................5
Word frequency per document...........................................................................................................6
Posting file...........................................................................................................................................7
Testing.................................................................................................................................................9
Boolean and vector queries.....................................................................................................................9
Question 2 IR evaluation...........................................................................................................................10
Search engines.......................................................................................................................................10
Targets...................................................................................................................................................10
Designed Search queries.......................................................................................................................10
Google search Engine........................................................................................................................10
Bing Search Engine............................................................................................................................11
Average for Google and Bing.............................................................................................................11
Bibliography...............................................................................................................................................12
Document Page
Question 1
Creating an inverted index
Document 1
Information retrieval is the activity of obtaining information resources relevant to an
information need from a collection of information resources. Searches can be based on full-text
or other content-based indexing.
Document 2
Information retrieval is finding material of an unstructured nature that satisfies an information
need from within large collections
Document 3
Information systems is the study of complementary networks of hardware and software that
people and organizations use to collect, filter, process, create, and distribute data.
Remove stop words
Results
Document 1
Information retrieval activity obtaining information resources relevant information collection
information resources Searches based full-text content-based indexing
Document 2
Information retrieval finding material unstructured nature satisfies information within large
collections
Document 3
Information systems study complementary networks hardware software people organizations
collect filter process create distribute data
Porter Stemming algorithm
Results
Document 1
Informat retriev activ obtain inform resourc relev inform collect inform resourc Search base full
text content base index
Document 2
Informat retriev find materi unstructur natur satisfi inform within larg collect
Document 3
Informat system studi complementari network hardwar softwar peopl organ collect filter
process creat distribut data
Document Page
The three documents merged
Term Document
Informat 1
retriev 1
activ 1
Obtain 1
Inform 1
Resourc 1
Relev 1
Inform 1
Resourc 1
Search 1
Base 1
Full 1
Text 1
Content 1
Base 1
index 1
Informat 2
Retriev 2
Find 2
Materi 2
Unstructur 2
Natur 2
Satisfi 2
Inform 2
Within 2
Larg 2
collect 2
Inform 3
System 3
Studi 3
Complementari 3
Network 3
Hardwar 3
Softwar 3
Peopl 3
Organ 3
collect 3
Filter 3
Process 3
creat 3
Distribut 3
data 3

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Sorting words in alphabetical order for all documents
Term Document
activ 1
Base 1
Base 1
collect 2
collect 3
Complementari 3
Content 1
creat 3
data 3
Distribut 3
Filter 3
Find 2
Full 1
Hardwar 3
index 1
Inform 1
Inform 1
Inform 2
Inform 3
Informat 3
Larg 2
Materi 2
Natur 2
Network 3
Obtain 1
Organ 3
Peopl 3
Process 3
Relev 1
Resourc 1
Resourc 1
retriev 1
Retriev 2
Satisfi 2
Search 1
Softwar 3
Document Page
Studi 3
System 3
Text 1
Unstructur 2
Within 2
Word frequency per document
Term Frequency Document
activ 1 1
Base 2 1
collect 1 2
collect 1 3
Complementari 1 3
Content 1 1
creat 1 3
data 1 3
Distribut 1 3
Filter 1 3
Find 1 2
Full 1 1
Hardwar 1 3
index 1 1
Inform 2 1
Inform 1 2
Inform 2 3
Larg 1 2
Materi 1 2
Natur 1 2
Network 1 3
Obtain 1 1
Organ 1 3
Peopl 1 3
Process 1 3
Relev 1 1
Resourc 2 1
Retriev 1 1
Retriev 1 2
Document Page
Satisfi 1 2
Search 1 1
Softwar 1 3
Studi 1 3
System 1 3
Text 1 1
Unstructur 1 2
Within 1 2
Posting file

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Document Page
Testing
To test the documents, two search engines are tested with the words appearing most according to the
post file created above and then the results of both search engines are compared to determine whether
the returned documents are related to the original documents from which the posting file was made
from.
Boolean and vector queries
a. Boolean queries
1) information AND system AND index= Doc1, Doc2 , DOC3
2) System AND index= Doc1, Doc3
3) Information AND Index= Doc1, Doc2 , DOC3
b. Vector queries
Getting the cosine similarity based on;
Query Q= (Information, system, index)
Document 1
D1=<information, information, information, system, index> = <3, 1, 0>
Q=< information, system, index> = <1, 1, 1>
3 x 1+1 x 1+ 0 x 1
32+12 +02 12+12+12 = 4
7 3 = 1.15
Document 2
D2=<information, information> = <2, 0, 0>
Q=<information, information, system, index> = <1, 1, 1>
2 x 1+0 x 1+0 x 1
22 +02+ 02 12 +12+12 = 2
4 3 = 0.76
Document 3
D=<information, system>= <1, 1, 0>
Q=<information, information, system, index> <1, 1, 1>
σ ( D3 , Q)= 1 x 1+ 1 x 1+ 0 x 1
12 +12 +02 12+12 +12 = 2
2 3 = 1.07
Document Page
Based on the results of the cosine similarity the order in which the documents appear is a
search engine is;
1) Document 1
2) Document 2
3) Document 3
Question 2 IR evaluation
Search engines
Google
Bing
Targets
Target 1: Obtain the course information for S779.
Target 2: Obtain the price of new Samsung tablet.
Designed Search queries
Query 1= S779 course information
Query 2= new Samsung tablet price

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Google search Engine
Figure 1: Google Search Engine
Bing Search Engine
Figure 2: Yahoo search engine
Document Page
Average for Google and Bing
Figure 3: Comparison by average
From the graph shown in figure 3 above, it’s clear that Google search engine performs better compared
to Bing search engine. According to the graph, Google is more precise and has a higher recall value
based on the average of the two search queries.
Bibliography
Brasetvik, A. (2013). Elasticsearch from the Bottom Up, Part 1. [online] Elastic. Available at:
https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up [Accessed 27 May 2018].
1 out of 12
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]