Deakin University SIT772: Information Retrieval Techniques Assignment
VerifiedAdded on 2023/04/19
|10
|1419
|246
Homework Assignment
AI Summary
This document presents a comprehensive solution to an information retrieval assignment. It begins by constructing an inverted index, removing stop words, and applying the Porter stemming algorithm to a set of documents. The merged inverted list and posting file are then created. The solution explores both Boolean and vector models for information retrieval, demonstrating how queries are processed and documents are ranked. The assignment also includes a practical section where search queries are tested on Google and Yahoo search engines, and their performance is evaluated based on precision and recall. The document concludes with a bibliography of the resources used.

COVER PAGE (ENTER YOUR DETAILS)
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Contents
Question 1...................................................................................................................................................3
Creating an inverted index......................................................................................................................3
Remove stop words.............................................................................................................................3
Porter Stemming algorithm.................................................................................................................3
Merged inverted list................................................................................................................................4
Posting file...........................................................................................................................................5
Testing.................................................................................................................................................7
Boolean Model and vector Model...........................................................................................................7
Question 2...................................................................................................................................................8
Bibliography...............................................................................................................................................10
Question 1...................................................................................................................................................3
Creating an inverted index......................................................................................................................3
Remove stop words.............................................................................................................................3
Porter Stemming algorithm.................................................................................................................3
Merged inverted list................................................................................................................................4
Posting file...........................................................................................................................................5
Testing.................................................................................................................................................7
Boolean Model and vector Model...........................................................................................................7
Question 2...................................................................................................................................................8
Bibliography...............................................................................................................................................10

Question 1
Creating an inverted index
Document 1
Information retrieval is the activity of obtaining information resources relevant to an
information need from a collection of information resources. Searches can be based on full-text
or other content-based indexing.
Document 2
Information retrieval is finding material of an unstructured nature that satisfies an information
need from within large collections
Document 3
Information systems is the study of complementary networks of hardware and software that
people and organizations use to collect, filter, process, create, and distribute data.
Remove stop words
Results
Document 1
Information retrieval activity obtaining information resources relevant information collection
information resources Searches based full-text content-based indexing
Document 2
Information retrieval finding material unstructured nature satisfies information within large
collections
Document 3
Information systems study complementary networks hardware software people organizations
collect filter process create distribute data
Porter Stemming algorithm
Results
Document 1
Informat retriev activ obtain inform resourc relev inform collect inform resourc Search base full
text content base index
Document 2
Informat retriev find materi unstructur natur satisfi inform within larg collect
Document 3
Informat system studi complementari network hardwar softwar peopl organ collect filter
process creat distribut data
Creating an inverted index
Document 1
Information retrieval is the activity of obtaining information resources relevant to an
information need from a collection of information resources. Searches can be based on full-text
or other content-based indexing.
Document 2
Information retrieval is finding material of an unstructured nature that satisfies an information
need from within large collections
Document 3
Information systems is the study of complementary networks of hardware and software that
people and organizations use to collect, filter, process, create, and distribute data.
Remove stop words
Results
Document 1
Information retrieval activity obtaining information resources relevant information collection
information resources Searches based full-text content-based indexing
Document 2
Information retrieval finding material unstructured nature satisfies information within large
collections
Document 3
Information systems study complementary networks hardware software people organizations
collect filter process create distribute data
Porter Stemming algorithm
Results
Document 1
Informat retriev activ obtain inform resourc relev inform collect inform resourc Search base full
text content base index
Document 2
Informat retriev find materi unstructur natur satisfi inform within larg collect
Document 3
Informat system studi complementari network hardwar softwar peopl organ collect filter
process creat distribut data
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

Merged inverted list
Term Frequency Document
activ 1 1
Base 2 1
collect 1 2
collect 1 3
Complementari 1 3
Content 1 1
creat 1 3
data 1 3
Distribut 1 3
Filter 1 3
Find 1 2
Full 1 1
Hardwar 1 3
index 1 1
Inform 2 1
Inform 1 2
Inform 2 3
Larg 1 2
Materi 1 2
Natur 1 2
Network 1 3
Obtain 1 1
Organ 1 3
Peopl 1 3
Process 1 3
Relev 1 1
Resourc 2 1
Retriev 1 1
Retriev 1 2
Satisfi 1 2
Search 1 1
Softwar 1 3
Studi 1 3
System 1 3
Text 1 1
Unstructur 1 2
Within 1 2
Term Frequency Document
activ 1 1
Base 2 1
collect 1 2
collect 1 3
Complementari 1 3
Content 1 1
creat 1 3
data 1 3
Distribut 1 3
Filter 1 3
Find 1 2
Full 1 1
Hardwar 1 3
index 1 1
Inform 2 1
Inform 1 2
Inform 2 3
Larg 1 2
Materi 1 2
Natur 1 2
Network 1 3
Obtain 1 1
Organ 1 3
Peopl 1 3
Process 1 3
Relev 1 1
Resourc 2 1
Retriev 1 1
Retriev 1 2
Satisfi 1 2
Search 1 1
Softwar 1 3
Studi 1 3
System 1 3
Text 1 1
Unstructur 1 2
Within 1 2
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Posting file
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Activ 1
Base 2 1
Collect 1 2
Complementari 1 3
Content 1 3
Data 1 3
Distribut 1 3
Filter 1 3
Find 1 2
Full 1 1
Hardwar 1 3
Index 1 1
Inform 5 1
Larg 1 2
3
3 2
Word Frequency
1
Posting
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Activ 1
Base 2 1
Collect 1 2
Complementari 1 3
Content 1 3
Data 1 3
Distribut 1 3
Filter 1 3
Find 1 2
Full 1 1
Hardwar 1 3
Index 1 1
Inform 5 1
Larg 1 2
3
3 2
Word Frequency
1
Posting

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Materi 1 2
Natur 1 2
Network 1 3
Obtain 1 1
Organ 1 3
Peopl 1 3
Process 1 3
relev 1 1
Resourc 1 2
Retriev 1 1
Satisfi 1 2
Search 1 1
Softwar 1 3
1
Studi 1 3
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Materi 1 2
Natur 1 2
Network 1 3
Obtain 1 1
Organ 1 3
Peopl 1 3
Process 1 3
relev 1 1
Resourc 1 2
Retriev 1 1
Satisfi 1 2
Search 1 1
Softwar 1 3
1
Studi 1 3
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Testing
Testing the posting file involves using a search engine to query key words and the document retrieved
are evaluated based on their similarity with the documents used to make the posting file.
Boolean Model and vector Model
a. Boolean Model
1) information Ʌ system Ʌ index
Returns Doc1, Doc2 , DOC3
2) System Ʌ index Ʌ ¬Resource
Returns Doc1, Doc3
3) (Information V Info) Ʌ Index
Returns Doc1, Doc2 , DOC3
b. Vector Model
Query Q= (Information, system, index)
Document 1
D1= <3, 1, 0>
Q= <1, 1, 1>
3 x 1+1 x 1+ 0 x 1
√ 32+12 +02 √ 12+12+12 = 4
√ 7 √ 3 = 1.15
Document 2
D2= <2, 0, 0>
Q = <1, 1, 1>
System 1 3
Text 1 1
Unstructur 1 2
Within 1 2
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Testing
Testing the posting file involves using a search engine to query key words and the document retrieved
are evaluated based on their similarity with the documents used to make the posting file.
Boolean Model and vector Model
a. Boolean Model
1) information Ʌ system Ʌ index
Returns Doc1, Doc2 , DOC3
2) System Ʌ index Ʌ ¬Resource
Returns Doc1, Doc3
3) (Information V Info) Ʌ Index
Returns Doc1, Doc2 , DOC3
b. Vector Model
Query Q= (Information, system, index)
Document 1
D1= <3, 1, 0>
Q= <1, 1, 1>
3 x 1+1 x 1+ 0 x 1
√ 32+12 +02 √ 12+12+12 = 4
√ 7 √ 3 = 1.15
Document 2
D2= <2, 0, 0>
Q = <1, 1, 1>
System 1 3
Text 1 1
Unstructur 1 2
Within 1 2
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

2 x 1+0 x 1+0 x 1
√22 +02+ 02 √12 +12+12 = 2
√4 √3 = 0.76
Document 3
D= <1, 1, 0>
Q= <1, 1, 1>
σ ( D3 , Q)= 1 x 1+ 1 x 1+ 0 x 1
√12 +12 +02 √12+12 +12 = 2
√2 √3 = 1.07
The documents are retrieved in the following order Doc1 , Doc 2 then Doc 3
Question 2
a. Target
Target 2: obtain the price of the new Samsung Tablet.
Search queries
Query 1= new Samsung tablet price
Query 2= new Samsung tablet (price,cost)
Search Engines
o Google
o Yahoo
Google
√22 +02+ 02 √12 +12+12 = 2
√4 √3 = 0.76
Document 3
D= <1, 1, 0>
Q= <1, 1, 1>
σ ( D3 , Q)= 1 x 1+ 1 x 1+ 0 x 1
√12 +12 +02 √12+12 +12 = 2
√2 √3 = 1.07
The documents are retrieved in the following order Doc1 , Doc 2 then Doc 3
Question 2
a. Target
Target 2: obtain the price of the new Samsung Tablet.
Search queries
Query 1= new Samsung tablet price
Query 2= new Samsung tablet (price,cost)
Search Engines
o Google
o Yahoo

Figure 1: Google Search Engine
b. Yahoo
Figure 2: Yahoo search engine
c. Google and Yahoo search engines average
b. Yahoo
Figure 2: Yahoo search engine
c. Google and Yahoo search engines average
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

Figure 3:Comparison by average
Evaluation
Google is more powerful than Yahoo for the designed search queries because it is more precise meaning
it retrieves more number of correct results over the relevant results than yahoo than it has a higher
precision and it has a higher recall value than Yahoo because the number of relevant documents over
the total documents retrieved is higher for Google compared to Yahoo.
Bibliography
Google Developers. (2018). Classification: Precision and Recall | Machine Learning Crash
Course | Google Developers. [online] Available at: https://developers.google.com/machine-
learning/crash-course/classification/precision-and-recall [Accessed 26 Jan. 2019].
Evaluation
Google is more powerful than Yahoo for the designed search queries because it is more precise meaning
it retrieves more number of correct results over the relevant results than yahoo than it has a higher
precision and it has a higher recall value than Yahoo because the number of relevant documents over
the total documents retrieved is higher for Google compared to Yahoo.
Bibliography
Google Developers. (2018). Classification: Precision and Recall | Machine Learning Crash
Course | Google Developers. [online] Available at: https://developers.google.com/machine-
learning/crash-course/classification/precision-and-recall [Accessed 26 Jan. 2019].
1 out of 10
Related Documents

Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
Copyright © 2020–2025 A2Z Services. All Rights Reserved. Developed and managed by ZUCOL.