logo

Creating an Inverted Index for Data Science, Data Mining, and Information Systems

   

Added on  2023-03-17

20 Pages1923 Words93 Views
COVER PAGE

Question 1
1 Creating an inverted index using the following documents
Document 1
Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and
systems to extract knowledge and insights from data in various forms, both structured and
unstructured.
Document 2
Data mining is the process of discovering patterns in large data sets involving methods at the
intersection of machine learning, statistics, and database systems
Document 3
Information systems is the study of complementary networks of hardware and software that
people and organizations use to collect, filter, process, create, and distribute data
1a) Removing stop words
Results
Document 1
Data science interdisciplinary field scientific methods, processes, algorithms systems extract
knowledge insights data various forms, structured unstructured.
Document 2
Data mining process discovering patterns large data sets involving methods intersection
machine learning, statistics, database systems
Document 3
Information systems study complementary networks hardware software people organizations
collect, filter, process, create, distribute data
1b) applying Porter Stemming algorithm
Stemmed documents
Document 1
Data scienc interdisciplinari field scientif method process algorithm system extract knowledg
insight data variou form structur unstructur
Document 2
Data mine process discov pattern larg data set involv method intersect machin learn statist
databas system
Document 3
Informat system studi complementari network hardwar softwar peopl organ collect filter
process creat distribut data

1c) Merged inverted list including within-document frequencies
Term Document Frequency
algorithm 1 1
collect 3 1
complementari 3 1
creat 3 1
Data 1 2
Data 2 2
data 3 1
databas 2 1
discov 2 1
distribut 3 1
extract 1 1
field 1 1
filter 3 1
form 1 1
hardwar 3 1
informat 3 1
insigt 1 1
interdisciplinari 1 1
intersect 2 1
involv 2 1
knowledg 1 1
larg 2 1
learn 2 1
machin 2 1
method 1 1
method 2 1
min 2 1
network 3 1
organ 3 1
pattern 2 1
peopl 3 1
process 1 1
process 2 1
process 3 1
Scienc 1 1
scientif 1 1
set 2 1
softwar 3 1
statist 2 1
structur 1 1
studi 3 1
system 1 1
system 2 1
system 3 1
unstructur 1 1
variou 1 1

1c) dictionary related Posting file

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Results of Removing Stop Words and Inverted Indexing
|32
|2201
|499

IR Evaluation: Search Engines
|25
|2043
|81

Creating an Inverted Index
|12
|2086
|131

(PDF) Inverted indexes: Types and techniques
|13
|1074
|26

Creating an Inverted Index for Information Retrieval | Desklib
|11
|1087
|53

Creating an Inverted Index and IR Evaluation for Desklib
|12
|1121
|120