Question 1...................................................................................................................................................3
Creating an inverted index......................................................................................................................3
Remove stop words.............................................................................................................................3
Porter Stemming algorithm.................................................................................................................3
Merged list..............................................................................................................................................4
Posting file...........................................................................................................................................5
Boolean Model and vector Model...........................................................................................................7
Question 2 IR evaluation.............................................................................................................................8
Question 1
Creating an inverted index
Document 1
Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and
systems to extract knowledge and insights from data in various forms, both structured and
Document 2
Data mining is the process of discovering patterns in large data sets involving methods at the
intersection of machine learning, statistics, and database systems
Document 3
Information systems is the study of complementary networks of hardware and software that
people and organizations use to collect, filter, process, create, and distribute data
Remove stop words
Document 1
Data science interdisciplinary field scientific methods, processes, algorithms systems extract
knowledge insights data various forms, structured unstructured.
Document 2
Data mining process discovering patterns large data sets involving methods intersection
machine learning, statistics, database systems
Document 3
Information systems study complementary networks hardware software people organizations
collect, filter, process, create, distribute data
Porter Stemming algorithm
Document 1
Data scienc interdisciplinari field scientif method process algorithm system extract knowledg
insight data variou form structur unstructur
Document 2
Data mine process discov pattern larg data set involv method intersect machin learn statist
databas system
Document 3
Informat system studi complementari network hardwar softwar peopl organ collect filter
process creat distribut data
Merged list
Meged sorted list Merged Sorted List with within document frequency
Term Document Term DocumentFrequency
algorithm 1 algorithm 1 1
collect 3 collect 3 1
complementari 3 complementari 3 1
creat 3 creat 3 1
Data 1 Data 1 2
data 1 Data 2 2
Data 2 data 3 1
data 2 databas 2 1
data 3 discov 2 1
databas 2 distribut 3 1
discov 2 extract 1 1
distribut 3 field 1 1
extract 1 filter 3 1
field 1 form 1 1
filter 3 hardwar 3 1
form 1 informat 3 1
hardwar 3 insigt 1 1
informat 3 interdisciplinari 1 1
insigt 1 intersect 2 1
interdisciplinari 1 involv 2 1
intersect 2 knowledg 1 1
involv 2 larg 2 1
knowledg 1 learn 2 1
larg 2 machin 2 1
learn 2 method 1 1
machin 2 method 2 1
method 1 mine 2 1
method 2 network 3 1
mine 2 organ 3 1
network 3 pattern 2 1
organ 3 peopl 3 1
pattern 2 process 1 1
peopl 3 process 2 1
process 1 process 3 1
process 2 Scienc 1 1
process 3 scientif 1 1
Scienc 1 set 2 1
scientif 1 softwar 3 1
set 2 statist 2 1
softwar 3 structur 1 1
statist 2 studi 3 1
structur 1 system 1 1
studi 3 system 2 1
system 1 system 3 1
system 2 unstrucur 1 1
system 3 variou 1 1
unstrucur 1
variou 1
