Contents COVER PAGE (ENTER YOUR DETAILS)..................................................................................................1 Question 1...................................................................................................................................................3 Creating an inverted index......................................................................................................................3 Stop words removal.............................................................................................................................3 Applying Porter Stemming algorithm..................................................................................................3 Normalized tokens for each document...............................................................................................4 Merged normalized tokens for the three documents..............................................................................5 Sort the normalized tokens in alphabetical order................................................................................6 Create frequencies for tokens per document......................................................................................6 Dictionary and related posting file.......................................................................................................8 Testing.................................................................................................................................................9 Boolean and vector queries.....................................................................................................................9 Question 2 IR evaluation...........................................................................................................................10 Search engines.......................................................................................................................................10 Targets...................................................................................................................................................10 Search queries.......................................................................................................................................10 Google search Engine........................................................................................................................11 Bing Search Engine............................................................................................................................12 Average for Google and Bing.............................................................................................................13 Bibliography...............................................................................................................................................13
Question 1 Creating an inverted index The following documents are used to create the inverted index Document 1 (D1) Information retrieval is the activity of obtaining information resources relevant to an information need from a collection of information resources. Searches can be based on full-text or other content-based indexing. Document 2 (D2) Information retrieval is finding material of an unstructured nature that satisfies an information need from within large collections Document 3 (D3) Information systems is the study of complementary networks of hardware and software that people and organizations use to collect, filter, process, create, and distribute data. To create the inverted index, the following steps are followed; Removing stop words Applying porters algorithm Create normalized tokens for all documents Merge the tokens into one list and arrange them in alphabetical order Add frequencies of each token. Stop words removal After removing the stop words the new documents become; D1 Information retrieval activity obtaining information resources relevant information collection information resources Searches based full-text content-based indexing D2 Information retrieval finding material unstructured nature satisfies information within large collections D3 Information systems study complementary networks hardware software people organizations collect filter process create distribute data Applying Porter Stemming algorithm After applying porter stemming algorithm; D1 Information retrieve active obtain inform resource relevant information collect information resource Search base full text content base index D2 Information retrieve find material unstructure nature satisfy information within large collect D3
Information system study complement network hardware software people organ collect filter process create distribute data Normalized tokens for each document D1 TokenDocument ID Information1 Retrieve1 Active1 Obtain1 Inform1 Resource1 Relevant1 Information1 Resource1 Search1 Base1 Full1 Text1 Content1 Base1 index1 D2 TokenDocument ID Information2 Retrieve2 Find2 Material2 Unstructure2 Nature2 Satisfy2 Information2 Within2 Large2 collect2
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Dictionary and related posting file Testing Testing of the inverted index using information, system and index keywords was done using Google. The results w
Boolean and vector queries a.Boolean queries 1)(informationANDsystemAND index) Result: D1, D2 and DOC3 2)(System AND index) Result: D1, D3 3)(System AND NOT Index) Result: D3 b.Vector model using cosine similarity Query Q= (Information, system, index) Cosine similarity; Document 1 D1=<information, information, information, system, index> = <3, 1, 0> Q=< information, system, index> = <1, 1, 1> σ(D1,Q)=3x1+1x1+0x1 √32+12+02√12+12+12=4 √7√3=1.15 For D2 D2=<information, information> = <2, 0, 0> Q=<information, information, system, index> = <1, 1, 1> σ(D2,Q)=2x1+0x1+0x1 √22+02+02√12+12+12=2 √4√3=0.76 For D3 D=<information, system>= <1, 1, 0> Q=<information, information, system, index> <1, 1, 1> σ(D3,Q)=1x1+1x1+0x1 √12+12+02√12+12+12=2 √2√3=1.07 According to the results of each document the order in which the documents appear in search results is
D1D2D3 Question 2 IR evaluation Search engines Two of the top search engines to perform IR evaluation. The search engines are; Google Bing Targets Target 2: obtain the price of the new Samsung Tablet. Target 3: obtain the manual of installing tera term Search queries Query 1= Samsung tablet price Query 2= tera term installation manual
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Google search Engine Figure1: Google Search Engine
Bing Search Engine Figure2: Yahoo search engine
Average for Google and Bing Figure3: Comparison by average According to figure 3 above which shows the average precision and recall for Google and Bing, Google is has a higher recall value and is more precise then Bing as shown by the graph. Bibliography Manning, C. D., Raghavan, P., & Schutze, H. (2008).Introduction to Information Retrieval. Cambridge University.