ProductsLogo
LogoStudy Documents
LogoAI Grader
LogoAI Answer
LogoAI Code Checker
LogoPlagiarism Checker
LogoAI Paraphraser
LogoAI Quiz
LogoAI Detector
PricingBlogAbout Us
logo

Information Retrieval Techniques and Evaluation - Desklib

Verified

Added on  2023/04/23

|9
|811
|376
AI Summary
This document discusses various Information Retrieval Techniques like Stop word removal, Porters stemming algorithm, Merged inverted list, Posting file, Inverted index test, Boolean queries, Vector model using cosine similarity and IR evaluation. It also evaluates top search engines like Bing and Google for IR. The document is suitable for students studying Information Retrieval Techniques and Evaluation. The course code, course name, and college/university are not mentioned.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
COVER PAGE

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Q 1
a. Stop word removal and porters stemming algorithm;
Stop words removal
Document 1: Information retrieval activity obtaining information resources relevant information
collection information resources Searches based full-text content-based indexing
Document 2: Information retrieval finding material unstructured nature satisfies information
within large collections
Document 3: Information systems study complementary networks hardware software people
organizations collect filter process create distribute data
Porters stemming algorithm application
Document 1: Informat retriev activ obtain inform resourc relev inform collect inform resourc
Search base full text content base index
Document 2: Informat retriev find materi unstructur natur satisfi inform within larg collect
Document 3: Informat system studi complementari network hardwar softwar peopl organ
collect filter process creat distribut data
b. Merged inverted list
To create the merged inverted list, the following steps are followed;
1. Taking the final documents achieved after removing stop words and applying porters
stemming algorithm then creating a table showing each term and the document the term is
contained in.
2. The table achieved in step 1 above is then taken and ordered in ascending order depending
on the term.
3. A merged list is created to show within document frequencies of each term as shown in the
table below.
A great tool to perform this steps is Microsoft Excel as it has automated most of the actions for
example ordering the terms in ascending order.
Document Page
c. Posting file
Document Page

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
d. Inverted index test
The inverted index can be tested using keywords; information, system and index. To perform
the test a search query is designed using the three keywords and then a search is done using a
search engine like Google or Bing. If the inverted index is correct the results returned by the
search engine should be related to the three documents. Based on my test using google, the
results returned were first 20 results were related to information systems.
e. Boolean queries
Document Page
i. Active AND Base
Returns Doc1
ii. Retrieve and Information
Returns Doc1, Doc2 & Doc3
iii. (Retrieve OR Retreive) AND Search
Returns Doc1 & Doc2
f. Vector model using cosine similarity
Q= (Information, system, index)
Doc 1
D1 = <3, 1, 0>
Q= <1, 1, 1>
3 x 1+1 x 1+ 0 x 1
32+12 +02 12+12+12 = 4
7 3 = 1.15
Doc 2
D2= <2, 0, 0>
Q <1, 1, 1>
2 x 1+0 x 1+0 x 1
22 +02+ 02 12 +12+12 = 2
4 3 = 0.76
Doc 3
D= <1, 1, 0>
Q= <1, 1, 1>
σ ( D3 , Q)= 1 x 1+ 1 x 1+ 0 x 1
12 +12 +02 12+12 +12 = 2
2 3 = 1.07
Comparison
Both Boolean queries and vector models are used to show documents that are retrieved based
on a search query but vector model using cosine similarity is better because it shows the order
in which the documents will be retrieved but Boolean queires only show the documents that are
Document Page
retrieved but does not have a metric to show the order in which the documents should be
retrieved.
Question 2 IR evaluation
Top Search Engines
Bing
Google
Target
Target 2: obtain the price of the new Samsung Tablet.
Search queries
Query 1= New Samsung tablet Price
Query 2= Samsung Tablet Cost
Google

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Bing
Average comparison
Document Page
The graph shown above shows the average recall and precision of the two top search engines based on
the two designed queries. From the graph, it’s clear that Google performs better as compared to Bing
because it has a higher recall value and is more precise than Bing. Recall is the ratio of related results
over the total results while precision is the ratio of corrects results over the total related results (Joshi,
2016) thus Google the number of related documents retrieved by google is higher than those of Bings
thus the higher recall value and the number of correct documents over the total related documents is
higher for Google compared to Bing thus the higher precision.
Reference
Joshi, R. (2016). Accuracy, Precision, Recall & F1 Score: Interpretation of Performance
Measures - Exsilio Blog. [online] Exsilio Blog. Available at:
https://blog.exsilio.com/all/accuracy-precision-recall-f1-score-interpretation-of-performance-
measures/ [Accessed 22 Jan. 2019].
1 out of 9
[object Object]

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]