Information Retrieval Assignment: Models, Evaluation, and Analysis

Verified

Added on  2022/05/25

|15
|1877
|28
Homework Assignment
AI Summary
This assignment solution delves into the core concepts of Information Retrieval (IR). It begins by processing and merging documents, applying stop word removal and the Porter stemming algorithm to create a posting file. The solution then explores the Boolean and Vector models, demonstrating their application with example queries and comparing their functionalities. The Boolean model utilizes AND, OR, and NOT operators, while the Vector model employs term frequency-inverse document frequency (TF-IDF) for ranking documents. Furthermore, the assignment evaluates search engines, specifically comparing Bing and Yahoo based on precision and recall metrics, using graphical representations to illustrate their performance in retrieving relevant results. The solution concludes with a comparison of the two search engines based on the evaluation metrics.
Document Page
NAME:
STUDENT ID:
COURSE:
TUTOR:
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Contents
Question 1...................................................................................................................................................3
Boolean model and vector model...........................................................................................................8
Question 2 IR evaluation.............................................................................................................................9
Bibliography...............................................................................................................................................15
Document Page
Question 1
ï‚· Document 1
Information retrieval is the activity of obtaining information resources relevant to an
information need from a collection of information resources. Searches can be based on full-text
or other content-based indexing.
ï‚· Document 2
Information retrieval is finding material of an unstructured nature that satisfies an information
need from within large collections
ï‚· Document 3
Information systems is the study of complementary networks of hardware and software that
people and organizations use to collect, filter, process, create, and distribute data.
Based on the documents above;
Step 1: Remove stop words
ï‚· Doc 1
Information retrieval activity obtaining information resources relevant information collection
information resources Searches based full-text content-based indexing
ï‚· Doc 2
Information retrieval finding material unstructured nature satisfies information within large
collections
ï‚· Doc 3
Information systems study complementary networks hardware software people organizations
collect filter process create distribute data
Step2: Apply Porter Stemming algorithm
ï‚· Doc 1
Informat retriev activ obtain inform resourc relev inform collect inform resourc Search base full
text content base index
ï‚· Doc 2
Informat retriev find materi unstructur natur satisfi inform within larg collect
ï‚· Doc 3
Informat system studi complementari network hardwar softwar peopl organ collect filter
process creat distribut data
Step 3: Merge the documents
Merged list
Document Page
Term Document
activ 1
Base 1
Base 1
collect 2
collect 3
Complementari 3
Content 1
creat 3
data 3
Distribut 3
Filter 3
Find 2
Full 1
Hardwar 3
index 1
Inform 1
Inform 1
Inform 2
Inform 3
Informat 3
Larg 2
Materi 2
Natur 2
Network 3
Obtain 1
Organ 3
Peopl 3
Process 3
Relev 1
Resourc 1
Resourc 1
retriev 1
Retriev 2
Satisfi 2
Search 1
Softwar 3
Studi 3
System 3
Text 1
Unstructur 2
Within 2
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Merged list with within-document frequencies
Term Frequency Document
activ 1 1
Base 2 1
collect 1 2
collect 1 3
Complementari 1 3
Content 1 1
creat 1 3
data 1 3
Distribut 1 3
Filter 1 3
Find 1 2
Full 1 1
Hardwar 1 3
index 1 1
Inform 2 1
Inform 1 2
Inform 2 3
Larg 1 2
Materi 1 2
Natur 1 2
Network 1 3
Obtain 1 1
Organ 1 3
Peopl 1 3
Process 1 3
Relev 1 1
Resourc 2 1
Retriev 1 1
Retriev 1 2
Satisfi 1 2
Search 1 1
Softwar 1 3
Studi 1 3
System 1 3
Text 1 1
Unstructur 1 2
Within 1 2
Step 4: Create Posting file
Document Page
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
Term Frequency Posting
Activ 1 1
Base 2 1
Collect 1 2
Complementari 1 3
Content 1 3
Data 1 3
Distribut 1 3
Filter 1 3
Find 1 2
Full 1 1
Hardwar 1 3
Index 1 1
Inform 5 1
3
3 2
Document Page
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
Larg 1 2
Materi 1 2
Natur 1 2
Network 1 3
Obtain 1 1
Organ 1 3
Peopl 1 3
Process 1 3
relev 1 1
Resourc 1 2
Retriev 1 1
Satisfi 1 2
Search 1 1
Softwar 1 3
2
Studi 1 3
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
Testing the posting file
By using key words system, information and index a test can be done using a search engine like Google
to determine whether the results of the documents retrieved by the search engine matches the
documents used to create the posting file.
Boolean model and vector model
a. Boolean Model queries
1) Network AND People AND NOT search
This query returns Doc3 only.
2) Process AND (retrieve OR retreive)
This query returns Doc1, Doc2 and Doc3
3) Information AND Data
This query returns Doc1, Doc2 and Doc3
b. Vector Model
Query Q= (Information, system, index)
Doc 1
D1= <3, 1, 0>
Q= <1, 1, 1>
σ ( D1 , Q)= 3 x 1+ 1 x 1+ 0 x 1
√32 +12 +02 √12+12 +12 = 4
√7 √3 = 1.15
Doc 2
D2= <2, 0, 0>
Q = <1, 1, 1>
σ ( D2 , Q)= 2 x 1+ 0 x 1+0 x 1
√22 +02 +02 √12+ 12+12 = 2
√4 √3 = 0.76
System 1 3
Text 1 1
Unstructur 1 2
Within 1 2
Document Page
Document 3
D= <1, 1, 0>
Q= <1, 1, 1>
σ ( D3 , Q)= 1 x 1+ 1 x 1+ 0 x 1
√12 +12 +02 √12+12 +12 = 2
√2 √3 = 1.07
Boolean Model and Vector Model comparison
In information retrieval both Boolean model and vector model are used to show documents that are
retrieved based on a certain query. Boolean model shows the documents that are returned but does not
show the order in which the documents will be retrieved but vector model shows the documents that
will be retrieved and the order in which the documents will be retrieved from the first document to the
last document.
Question 2 IR evaluation
Search engines
ï‚· Bing
ï‚· Yahoo
Target
Target 4: Obtain Oracle SQL Tutorial
Queries
ï‚· Query 1= Oracle SQL Tutorial
ï‚· Query 2= Oracle SQL notes
Bing
Document Page
Query 1 Query 2
Precision Recall Precison Recall
R 1 0.0714 R 1 0.071
R 1 0.143 R 1 0.143
1 0.214 0.667 0.143
R 1 0.286 0.75 0.214
R 1 0.357 R 0.8 0.289
R 0.833 0.429 0.667 0.289
0.714 0.429 0.571 0.289
0.625 0.429 R 0.625 0.357
R 0.667 0.5 R 0.667 0.429
0.6 0.5 0.6 0.429
0.636 0.571 0.636 0.5
R 0.667 0.643 R 0.667 0.571
0.692 0.714 R 0.615 0.571
0.643 0.714 0.571 0.571
0.6 0.714 R 0.6 0.643
0.5625 0.714 0.5625 0.643
0.529 0.714 0.529 0.643
0.556 0.786 0.556 0.714
R 0.526 0.786 0.526 0.714
R 0.55 0.857 R 0.55 0.786
Interpolation Interpolation
Precision precision Average Precision
0 1 0 1 0 1
0.1 1 0.1 1 0.1 1
0.2 1 0.2 0.8 0.2 0.9
0.3 1 0.3 0.625 0.3 0.8125
0.4 0.833 0.4 0.667 0.4 0.75
0.5 0.667 0.5 0.667 0.5 0.667
0.6 0.667 0.6 0.615 0.6 0.641
0.7 0.526 0.7 0.6 0.7 0.563
0.8 0.55 0.8 0.55 0.8 0.55
0.9 0 0.9 0 0.9 0
1 0 1 0 1 0
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Figure 1: Bing precision against recall plotted graph
Yahoo search engine
Document Page
Query 1 Query 2
Precision Recall Precison Recall
R 1 0.0714 R 1 0.071
R 1 0.143 R 1 0.143
1 0.214 0.667 0.143
R 1 0.286 0.5 0.143
R 1 0.357 R 0.6 0.214
R 0.833 0.429 0.5 0.214
0.714 0.429 0.429 0.214
0.625 0.429 0.375 0.214
R 0.667 0.5 0.444 0.286
0.6 0.5 0.4 0.286
0.636 0.571 R 0.455 0.358
R 0.667 0.643 0.417 0.358
0.692 0.714 R 0.462 0.429
0.643 0.714 0.4 0.429
0.6 0.714 R 0.4375 0.5
0.5625 0.714 R 0.412 0.5
0.529 0.714 0.389 0.5
0.556 0.786 0.368 0.5
R 0.526 0.786 0.4 0.571
R 0.55 0.857
Interpolation Interpolation
Precision precision Average Precision
0 1 0 1 0 1
0.1 1 0.1 1 0.1 1
0.2 1 0.2 0.6 0.2 0.8
0.3 1 0.3 0.455 0.3 0.7275
0.4 0.833 0.4 0.462 0.4 0.6475
0.5 0.667 0.5 0.4375 0.5 0.55225
0.6 0.667 0.6 0.412 0.6 0.5395
0.7 0.526 0.7 0 0.7 0.263
0.8 0.55 0.8 0 0.8 0.275
0.9 0 0.9 0 0.9 0
1 0 1 0 1 0
chevron_up_icon
1 out of 15
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]