Information Retrieval Assignment - Data Science and Search Engines
VerifiedAdded on 2023/03/17
|20
|1923
|93
Homework Assignment
AI Summary
This assignment solution delves into the core concepts of Information Retrieval (IR). It begins by constructing an inverted index from a set of documents, detailing the processes of stop word removal, Porter stemming, and merging the inverted list with within-document frequencies. The solution then tests the inverted index using keywords and demonstrates the application of both Boolean and Vector models for query processing. The Boolean model utilizes AND and OR operations, while the Vector model employs cosine similarity to rank documents based on their relevance to a query. Furthermore, the assignment evaluates the performance of Google and Bing search engines, comparing their precision in retrieving relevant documents for specific queries related to the price of an Xbox One. The analysis includes a comparison of the search results and a conclusion on the superior performance of Google based on the provided data.

COVER PAGE
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Question 1
1 Creating an inverted index using the following documents
Document 1
Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and
systems to extract knowledge and insights from data in various forms, both structured and
unstructured.
Document 2
Data mining is the process of discovering patterns in large data sets involving methods at the
intersection of machine learning, statistics, and database systems
Document 3
Information systems is the study of complementary networks of hardware and software that
people and organizations use to collect, filter, process, create, and distribute data
1a) Removing stop words
Results
Document 1
Data science interdisciplinary field scientific methods, processes, algorithms systems extract
knowledge insights data various forms, structured unstructured.
Document 2
Data mining process discovering patterns large data sets involving methods intersection
machine learning, statistics, database systems
Document 3
Information systems study complementary networks hardware software people organizations
collect, filter, process, create, distribute data
1b) applying Porter Stemming algorithm
Stemmed documents
Document 1
Data scienc interdisciplinari field scientif method process algorithm system extract knowledg
insight data variou form structur unstructur
Document 2
Data mine process discov pattern larg data set involv method intersect machin learn statist
databas system
Document 3
Informat system studi complementari network hardwar softwar peopl organ collect filter
process creat distribut data
1 Creating an inverted index using the following documents
Document 1
Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and
systems to extract knowledge and insights from data in various forms, both structured and
unstructured.
Document 2
Data mining is the process of discovering patterns in large data sets involving methods at the
intersection of machine learning, statistics, and database systems
Document 3
Information systems is the study of complementary networks of hardware and software that
people and organizations use to collect, filter, process, create, and distribute data
1a) Removing stop words
Results
Document 1
Data science interdisciplinary field scientific methods, processes, algorithms systems extract
knowledge insights data various forms, structured unstructured.
Document 2
Data mining process discovering patterns large data sets involving methods intersection
machine learning, statistics, database systems
Document 3
Information systems study complementary networks hardware software people organizations
collect, filter, process, create, distribute data
1b) applying Porter Stemming algorithm
Stemmed documents
Document 1
Data scienc interdisciplinari field scientif method process algorithm system extract knowledg
insight data variou form structur unstructur
Document 2
Data mine process discov pattern larg data set involv method intersect machin learn statist
databas system
Document 3
Informat system studi complementari network hardwar softwar peopl organ collect filter
process creat distribut data

1c) Merged inverted list including within-document frequencies
Term Document Frequency
algorithm 1 1
collect 3 1
complementari 3 1
creat 3 1
Data 1 2
Data 2 2
data 3 1
databas 2 1
discov 2 1
distribut 3 1
extract 1 1
field 1 1
filter 3 1
form 1 1
hardwar 3 1
informat 3 1
insigt 1 1
interdisciplinari 1 1
intersect 2 1
involv 2 1
knowledg 1 1
larg 2 1
learn 2 1
machin 2 1
method 1 1
method 2 1
min 2 1
network 3 1
organ 3 1
pattern 2 1
peopl 3 1
process 1 1
process 2 1
process 3 1
Scienc 1 1
scientif 1 1
set 2 1
softwar 3 1
statist 2 1
structur 1 1
studi 3 1
system 1 1
system 2 1
system 3 1
unstructur 1 1
variou 1 1
Term Document Frequency
algorithm 1 1
collect 3 1
complementari 3 1
creat 3 1
Data 1 2
Data 2 2
data 3 1
databas 2 1
discov 2 1
distribut 3 1
extract 1 1
field 1 1
filter 3 1
form 1 1
hardwar 3 1
informat 3 1
insigt 1 1
interdisciplinari 1 1
intersect 2 1
involv 2 1
knowledg 1 1
larg 2 1
learn 2 1
machin 2 1
method 1 1
method 2 1
min 2 1
network 3 1
organ 3 1
pattern 2 1
peopl 3 1
process 1 1
process 2 1
process 3 1
Scienc 1 1
scientif 1 1
set 2 1
softwar 3 1
statist 2 1
structur 1 1
studi 3 1
system 1 1
system 2 1
system 3 1
unstructur 1 1
variou 1 1
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

1c) dictionary related Posting file
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser


1d) testing the inverted index
Keywords: Data, system and algorithm
Based on the dictionary posting file shown in the figure above
Data
System
Algorithm
The dictionary file can be tested by using the three key words data, system and algorithm where
a posting of each word can be constructed for each word thus proofing the validity of the
dictionary file.
Boolean model and vector model
a. Boolean Model queries
1) Data Ʌ Method
This query returns all documents
Word Frequency Posting
Keywords: Data, system and algorithm
Based on the dictionary posting file shown in the figure above
Data
System
Algorithm
The dictionary file can be tested by using the three key words data, system and algorithm where
a posting of each word can be constructed for each word thus proofing the validity of the
dictionary file.
Boolean model and vector model
a. Boolean Model queries
1) Data Ʌ Method
This query returns all documents
Word Frequency Posting
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

2) Hardware Ʌ Software
This query returns doc 3 only
3) method V process V System
This query returns all documents
b. Vector model using cosine similarity
Q (Data, System, System, Algorithm)
Document 1
D1= <2, 1, 1>
Q= <1, 2, 1>
σ ( D1 , Q) 2 x 2+1 x 1+ 1 x 1
√ 22 +12 +12 √ 12 +22 +12 = 6
√6 √6 = 0.99958
σ ( D 1 ,Q ) =0.99958
Document 2
D2= <2, 1, 0>
Q= <1, 2, 1>
Word Frequency Posting
Word Frequency Posting
This query returns doc 3 only
3) method V process V System
This query returns all documents
b. Vector model using cosine similarity
Q (Data, System, System, Algorithm)
Document 1
D1= <2, 1, 1>
Q= <1, 2, 1>
σ ( D1 , Q) 2 x 2+1 x 1+ 1 x 1
√ 22 +12 +12 √ 12 +22 +12 = 6
√6 √6 = 0.99958
σ ( D 1 ,Q ) =0.99958
Document 2
D2= <2, 1, 0>
Q= <1, 2, 1>
Word Frequency Posting
Word Frequency Posting
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

σ ( D2 , Q) 2 x 1+1 x 1+ 0 x 1
√22 +12 +02 √12+22 +12 = 3
√5 √6 = 0.857232
σ ( D 2 ,Q )=0.85732
Document 3
D= <1, 1, 0>
Q= <1, 2, 1>
σ ( D3 , Q)= 1 x 1+1 x 1+ 0 x 1
√ 12 +12 +02 √ 12+22 +12 = 2
√ 2 √ 6 = 0.686589
σ ( D 3 , Q ) =0.686589
Cosine similarity can be used to determine the similarity of each document to the query where
by the document with the highest similarity value is fetched first. Based on the calculation
shown above, the following cosine similarities are achieved;
o Document 1
σ ( D 1 ,Q ) =0.99958
o Document 2
σ ( D 2 ,Q )=0.85732
o Document 3
σ ( D 3 , Q )=0.686589
From the results shown above, the order of retrieval of the documents is as
follows;
Document 1 , Document 2, Document 3.
Vector space model using cosine similarity is better compared to Boolean model because by
getting the similarity to the search query, the model is able to determine the exact order in
which the documents will be retrieved.
Question 2 IR evaluation
Search engines
Google
Bing
Target
√22 +12 +02 √12+22 +12 = 3
√5 √6 = 0.857232
σ ( D 2 ,Q )=0.85732
Document 3
D= <1, 1, 0>
Q= <1, 2, 1>
σ ( D3 , Q)= 1 x 1+1 x 1+ 0 x 1
√ 12 +12 +02 √ 12+22 +12 = 2
√ 2 √ 6 = 0.686589
σ ( D 3 , Q ) =0.686589
Cosine similarity can be used to determine the similarity of each document to the query where
by the document with the highest similarity value is fetched first. Based on the calculation
shown above, the following cosine similarities are achieved;
o Document 1
σ ( D 1 ,Q ) =0.99958
o Document 2
σ ( D 2 ,Q )=0.85732
o Document 3
σ ( D 3 , Q )=0.686589
From the results shown above, the order of retrieval of the documents is as
follows;
Document 1 , Document 2, Document 3.
Vector space model using cosine similarity is better compared to Boolean model because by
getting the similarity to the search query, the model is able to determine the exact order in
which the documents will be retrieved.
Question 2 IR evaluation
Search engines
Bing
Target

Target 5: Price of new xbox one
Queries
Query 1= Price of new Xbox one
Query 2= new Xbox one cost
Google Search engine
Query 1 results
Relevant
D(1,2,3,4,5,9,10,11,12,14,16,19,20)
Irrelevant
D(6,7,8,13,15,17,18)
D1. Xbox One - EB Games Australia
D2. The best Xbox One S prices, bundles and sales in Australia (May 2018 ...
D3. The best Xbox One X prices, bundles and sales in Australia (May 2018 ...
D4. Xbox One Packages At JB Hi-Fi Stores + All The Newest Games Out ...
D5. The best price on Xbox One in Australia | finder.com.au
D6. Xbox | Official Site
D7. Xbox One S | Xbox
D8. Xbox | Target Australia
D9. Xbox One Console Deals, Games & Accessories | The Gamesmen
D10. Microsoft's Xbox One X price will start at $499 - The Verge
D11. Buy Xbox One S 500GB Console | Harvey Norman AU
D12. Buy Xbox One X 1TB Console | Harvey Norman AU
D13. xbox one console | Xbox | Gumtree Australia Free Local Classifieds
Queries
Query 1= Price of new Xbox one
Query 2= new Xbox one cost
Google Search engine
Query 1 results
Relevant
D(1,2,3,4,5,9,10,11,12,14,16,19,20)
Irrelevant
D(6,7,8,13,15,17,18)
D1. Xbox One - EB Games Australia
D2. The best Xbox One S prices, bundles and sales in Australia (May 2018 ...
D3. The best Xbox One X prices, bundles and sales in Australia (May 2018 ...
D4. Xbox One Packages At JB Hi-Fi Stores + All The Newest Games Out ...
D5. The best price on Xbox One in Australia | finder.com.au
D6. Xbox | Official Site
D7. Xbox One S | Xbox
D8. Xbox | Target Australia
D9. Xbox One Console Deals, Games & Accessories | The Gamesmen
D10. Microsoft's Xbox One X price will start at $499 - The Verge
D11. Buy Xbox One S 500GB Console | Harvey Norman AU
D12. Buy Xbox One X 1TB Console | Harvey Norman AU
D13. xbox one console | Xbox | Gumtree Australia Free Local Classifieds
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

D14. Xbox One - GameStop
D15. Xbox One and Xbox One S Consoles, Games and Accessories ...
D16. Xbox One - Best Buy
D17. How Much Does an Xbox One Cost? - MakeUseOf
D18. Microsoft Xbox One X: Australian Pricing, Specs And Release Date ...
D19. Xbox One | Entertainment | BIG W
D20. Xbox One S 500GB Consoles | eBay
Query 2 results
Relevant
D(1,2,4,5,6,7,9,11,13,14,16,17,18,19)
Irrelevant
D(3,8,10,12,15,20)
D1. Xbox One - EB Games Australia
D2. The best Xbox One S prices, bundles and sales in Australia (May 2018 ...
D3. The best Xbox One X prices, bundles and sales in Australia (May 2018 ...
D4. Xbox One Packages At JB Hi-Fi Stores + All The Newest Games Out ...
D5. The best price on Xbox One in Australia | finder.com.au
D6. Xbox | Official Site
D7. Xbox One S | Xbox
D8. Xbox | Target Australia
D9. Xbox One Console Deals, Games & Accessories | The Gamesmen
D10. Microsoft's Xbox One X price will start at $499 - The Verge
D11. Buy Xbox One S 500GB Console | Harvey Norman AU
D12. Buy Xbox One X 1TB Console | Harvey Norman AU
D13. xbox one console | Xbox | Gumtree Australia Free Local Classifieds
D14. Xbox One - GameStop
D15. Xbox One and Xbox One S Consoles, Games and Accessories ...
D16. Xbox One - Best Buy
D17. How Much Does an Xbox One Cost? - MakeUseOf
D18. Microsoft Xbox One X: Australian Pricing, Specs And Release Date ...
D19. Xbox One | Entertainment | BIG W
D20. Xbox One S 500GB Consoles | eBay
Query 2 results
Relevant
D(1,2,4,5,6,7,9,11,13,14,16,17,18,19)
Irrelevant
D(3,8,10,12,15,20)
D1. Xbox One - EB Games Australia
D2. The best Xbox One S prices, bundles and sales in Australia (May 2018 ...
D3. The best Xbox One X prices, bundles and sales in Australia (May 2018 ...
D4. Xbox One Packages At JB Hi-Fi Stores + All The Newest Games Out ...
D5. The best price on Xbox One in Australia | finder.com.au
D6. Xbox | Official Site
D7. Xbox One S | Xbox
D8. Xbox | Target Australia
D9. Xbox One Console Deals, Games & Accessories | The Gamesmen
D10. Microsoft's Xbox One X price will start at $499 - The Verge
D11. Buy Xbox One S 500GB Console | Harvey Norman AU
D12. Buy Xbox One X 1TB Console | Harvey Norman AU
D13. xbox one console | Xbox | Gumtree Australia Free Local Classifieds
D14. Xbox One - GameStop
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

D15. Xbox One and Xbox One S Consoles, Games and Accessories ...
D16. Xbox One - Best Buy
D17. How Much Does an Xbox One Cost? - MakeUseOf
D18. Microsoft Xbox One X: Australian Pricing, Specs And Release Date ...
D19. Xbox One | Entertainment | BIG W
D20. Xbox One S 500GB Consoles | eBay
D16. Xbox One - Best Buy
D17. How Much Does an Xbox One Cost? - MakeUseOf
D18. Microsoft Xbox One X: Australian Pricing, Specs And Release Date ...
D19. Xbox One | Entertainment | BIG W
D20. Xbox One S 500GB Consoles | eBay

⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide
1 out of 20
Related Documents

Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
Copyright © 2020–2025 A2Z Services. All Rights Reserved. Developed and managed by ZUCOL.