Your All-in-One AI-Powered Toolkit for Academic Success.

+13062052269

info@desklib.com

Available 24*7 on WhatsApp / Email

Company

Tools

Support

Creating an Inverted Index for Data Science, Data Mining, and Information Systems

Verified

Added on 2023/03/17

AI Summary

This document explains the process of creating an inverted index for data science, data mining, and information systems. It covers the removal of stop words, the application of the Porter Stemming algorithm, and the merging of inverted lists. The document also discusses testing the inverted index using keywords and explores the Boolean and vector models for querying. Additionally, it evaluates the performance of search engines using precision and recall.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.

COVER PAGE

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

💎 Get Pro

Question 1
1 Creating an inverted index using the following documents
 Document 1
Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and
systems to extract knowledge and insights from data in various forms, both structured and
unstructured.
 Document 2
Data mining is the process of discovering patterns in large data sets involving methods at the
intersection of machine learning, statistics, and database systems
 Document 3
Information systems is the study of complementary networks of hardware and software that
people and organizations use to collect, filter, process, create, and distribute data
1a) Removing stop words
Results
 Document 1
Data science interdisciplinary field scientific methods, processes, algorithms systems extract
knowledge insights data various forms, structured unstructured.
 Document 2
Data mining process discovering patterns large data sets involving methods intersection
machine learning, statistics, database systems
 Document 3
Information systems study complementary networks hardware software people organizations
collect, filter, process, create, distribute data
1b) applying Porter Stemming algorithm
Stemmed documents
 Document 1
Data scienc interdisciplinari field scientif method process algorithm system extract knowledg
insight data variou form structur unstructur
 Document 2
Data mine process discov pattern larg data set involv method intersect machin learn statist
databas system
 Document 3
Informat system studi complementari network hardwar softwar peopl organ collect filter
process creat distribut data

1c) Merged inverted list including within-document frequencies
Term Document Frequency
algorithm 1 1
collect 3 1
complementari 3 1
creat 3 1
Data 1 2
Data 2 2
data 3 1
databas 2 1
discov 2 1
distribut 3 1
extract 1 1
field 1 1
filter 3 1
form 1 1
hardwar 3 1
informat 3 1
insigt 1 1
interdisciplinari 1 1
intersect 2 1
involv 2 1
knowledg 1 1
larg 2 1
learn 2 1
machin 2 1
method 1 1
method 2 1
min 2 1
network 3 1
organ 3 1
pattern 2 1
peopl 3 1
process 1 1
process 2 1
process 3 1
Scienc 1 1
scientif 1 1
set 2 1
softwar 3 1
statist 2 1
structur 1 1
studi 3 1
system 1 1
system 2 1
system 3 1
unstructur 1 1
variou 1 1

1c) dictionary related Posting file

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

💎 Try AI Paraphraser

1d) testing the inverted index
Keywords: Data, system and algorithm
Based on the dictionary posting file shown in the figure above
 Data
 System
 Algorithm
The dictionary file can be tested by using the three key words data, system and algorithm where
a posting of each word can be constructed for each word thus proofing the validity of the
dictionary file.
Boolean model and vector model
a. Boolean Model queries
1) Data Ʌ Method
This query returns all documents
Word Frequency Posting

2) Hardware Ʌ Software
This query returns doc 3 only
3) method V process V System
This query returns all documents
b. Vector model using cosine similarity
Q (Data, System, System, Algorithm)
Document 1
D1= <2, 1, 1>
Q= <1, 2, 1>
σ ( D1 , Q) 2 x 2+1 x 1+ 1 x 1
√ 22 +12 +12 √ 12 +22 +12 = 6
√6 √6 = 0.99958
σ ( D 1 ,Q ) =0.99958
Document 2
D2= <2, 1, 0>
Q= <1, 2, 1>
Word Frequency Posting
Word Frequency Posting

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

💎 Get Pro

σ ( D2 , Q) 2 x 1+1 x 1+ 0 x 1
√22 +12 +02 √12+22 +12 = 3
√5 √6 = 0.857232
σ ( D 2 ,Q )=0.85732
Document 3
D= <1, 1, 0>
Q= <1, 2, 1>
σ ( D3 , Q)= 1 x 1+1 x 1+ 0 x 1
√ 12 +12 +02 √ 12+22 +12 = 2
√ 2 √ 6 = 0.686589
σ ( D 3 , Q ) =0.686589
Cosine similarity can be used to determine the similarity of each document to the query where
by the document with the highest similarity value is fetched first. Based on the calculation
shown above, the following cosine similarities are achieved;
o Document 1
 σ ( D 1 ,Q ) =0.99958
o Document 2
 σ ( D 2 ,Q )=0.85732
o Document 3
 σ ( D 3 , Q )=0.686589
From the results shown above, the order of retrieval of the documents is as
follows;
Document 1 , Document 2, Document 3.
Vector space model using cosine similarity is better compared to Boolean model because by
getting the similarity to the search query, the model is able to determine the exact order in
which the documents will be retrieved.
Question 2 IR evaluation
Search engines
 Google
 Bing
Target

Target 5: Price of new xbox one
Queries
 Query 1= Price of new Xbox one
 Query 2= new Xbox one cost
Google Search engine
 Query 1 results
Relevant
D(1,2,3,4,5,9,10,11,12,14,16,19,20)
Irrelevant
D(6,7,8,13,15,17,18)
D1. Xbox One - EB Games Australia
D2. The best Xbox One S prices, bundles and sales in Australia (May 2018 ...
D3. The best Xbox One X prices, bundles and sales in Australia (May 2018 ...
D4. Xbox One Packages At JB Hi-Fi Stores + All The Newest Games Out ...
D5. The best price on Xbox One in Australia | finder.com.au
D6. Xbox | Official Site
D7. Xbox One S | Xbox
D8. Xbox | Target Australia
D9. Xbox One Console Deals, Games & Accessories | The Gamesmen
D10. Microsoft's Xbox One X price will start at $499 - The Verge
D11. Buy Xbox One S 500GB Console | Harvey Norman AU
D12. Buy Xbox One X 1TB Console | Harvey Norman AU
D13. xbox one console | Xbox | Gumtree Australia Free Local Classifieds

D14. Xbox One - GameStop
D15. Xbox One and Xbox One S Consoles, Games and Accessories ...
D16. Xbox One - Best Buy
D17. How Much Does an Xbox One Cost? - MakeUseOf
D18. Microsoft Xbox One X: Australian Pricing, Specs And Release Date ...
D19. Xbox One | Entertainment | BIG W
D20. Xbox One S 500GB Consoles | eBay
 Query 2 results
Relevant
D(1,2,4,5,6,7,9,11,13,14,16,17,18,19)
Irrelevant
D(3,8,10,12,15,20)
D1. Xbox One - EB Games Australia
D2. The best Xbox One S prices, bundles and sales in Australia (May 2018 ...
D3. The best Xbox One X prices, bundles and sales in Australia (May 2018 ...
D4. Xbox One Packages At JB Hi-Fi Stores + All The Newest Games Out ...
D5. The best price on Xbox One in Australia | finder.com.au
D6. Xbox | Official Site
D7. Xbox One S | Xbox
D8. Xbox | Target Australia
D9. Xbox One Console Deals, Games & Accessories | The Gamesmen
D10. Microsoft's Xbox One X price will start at $499 - The Verge
D11. Buy Xbox One S 500GB Console | Harvey Norman AU
D12. Buy Xbox One X 1TB Console | Harvey Norman AU
D13. xbox one console | Xbox | Gumtree Australia Free Local Classifieds
D14. Xbox One - GameStop

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

💎 Try AI Paraphraser

D15. Xbox One and Xbox One S Consoles, Games and Accessories ...
D16. Xbox One - Best Buy
D17. How Much Does an Xbox One Cost? - MakeUseOf
D18. Microsoft Xbox One X: Australian Pricing, Specs And Release Date ...
D19. Xbox One | Entertainment | BIG W
D20. Xbox One S 500GB Consoles | eBay

Bing Search Engine
 Query 1 Results
Relevant documents
D(1,2,3,7,8,10,14,16,17,18)
Irrelevant documents
D(4,5,6,9,11,12,13,15,19,20)

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

💎 Get Pro

S1: Xbox One Packages At JB Hi-Fi Stores + All The Newest ...
S2: Xbox One - EB Games Australia
S3: Compare Xbox One Consoles | Xbox
S4: xbox one console | eBay
S5: Xbox One Games - Buy The Lastest Xbox Games Online ...
S6: Xbox One | Entertainment | BIG W
S7: xbox one - Compare The Best xbox one Prices In Australia ...
S8: Xbox One S 500GB Console | JB Hi-Fi
S9: Xbox One and Xbox One S Consoles, Games and Accessories ...
S10: xbox one console | eBay
S11: Xbox One S 500GB Console | JB Hi-Fi
S12: Xbox One X review: The pinnacle of console gaming ...
S13: Xbox One Best Price in Australia | Compare & Buy with ...
S14: Xbox One Console Deals, Games & Accessories | The Gamesmen
S15: Xbox One Games - Buy The Lastest Xbox Games Online ...
S16: Xbox One - Harvey Norman Australia
S17: New Xbox One Deal Drops The Price Of The System Nicely ...
S18: Xbox: Xbox One and Xbox One S Consoles, Games ...
S19: Xbox One - Best Buy
S20: Xbox One S 500GB Console | JB Hi-Fi

 Query 2 results
Relevant documents
D(1,3,4,6,10,13,14,15,18,19,20)
Irrelevant documents
D(2,7,8,9,11,12,16,17)
S1: Compare Xbox One Consoles | Xbox
S2: Xbox One S | Xbox
S3: Xbox One - EB Games Australia
S4: Compare Xbox One Consoles | Xbox
S5:Xbox One S | Xbox
S6: Xbox One Packages At JB Hi-Fi Stores + All The Newest ...
S7: Xbox One | Entertainment | BIG W
S8: Why does the Xbox One X cost $500? We asked Xbox chief ...
S9: Xbox One Games - Buy The Lastest Xbox Games Online ...
S10: xbox one | eBay
S11: Xbox One - Best Buy
S12: Why does the Xbox One X cost $500? We asked Xbox chief ...
S13: Xbox: Xbox One and Xbox One S Consoles, Games ...
S14: How Much Does an Xbox One Cost? - MakeUseOf
S15: Xbox One X review: The pinnacle of console gaming ...
S16: Xbox One Best Price in Australia | Compare & Buy with ...
S17: Xbox 360 | Entertainment | BIG W
S18: Xbox One Consoles, Games, Controllers - Walmart.com
S19:Xbox One - Harvey Norman Australia
S20: Xbox One and Xbox One S Consoles, Games and Accessories ...

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

💎 Try AI Paraphraser

Average

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

💎 Get Pro

Based on the graphs shown on the figure above, Google is superior to Bing search engine because it has
a higher precision that Bing meaning it retrieves a higher number of related documents to the query
than Bing
Bibliography
classeval. (n.d.). Introduction to the precision-recall plot. [online] Available at:
https://classeval.wordpress.com/introduction/introduction-to-the-precision-recall-plot/
[Accessed 16 May. 2019].

1 out of 20

+13062052269

info@desklib.com

Creating an Inverted Index for Data Science, Data Mining, and Information Systems

Contribute Materials

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Related Documents

Results of Removing Stop Words and Inverted Indexing

IR Evaluation: Search Engines

Creating an Inverted Index

(PDF) Inverted indexes: Types and techniques

Creating an Inverted Index for Information Retrieval | Desklib

Creating an Inverted Index and IR Evaluation for Desklib