Information Storage and Retrieval

Verified

Added on  2023/06/03

|10
|2474
|219
AI Summary
This article discusses Broder's taxonomy of web search, Ellis model, natural language processing, topic models, ontology, and page rank in information storage and retrieval. It also provides an overview of the advantages and disadvantages of ontology and the formula for page rank.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
Running head: INFORMATION STORAGE AND RETRIEVAL
Information Storage and Retrieval
Name of the Student:
Name of the University:
Author Note

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
1
INFORMATION STORAGE AND RETRIEVAL
Answer to question number 1
a. Using Broder’s taxonomy of web search
1. “I want to find the QUT homepage”.
The provided web information need can be classified into the navigational web searches as the
main intent of the user in this situation is to reach a particular type of site on the internet.
2. “I want to find the best provider for health insurance for single female, aged 20-40”
The provided web information need can be classified into the Informational class of web
searches as the main intent of the user is acquire the required information present on the internet
that are present in any of the webpage available on the internet.
3. “I want to find out information about Queensland’s Old Government House. For
example: When was it built? Who lived there? At which University is it situated in?”
The provided web information need can be classified into the Informational class of web
searches as the main intent of the user is acquire the required information present on the internet
that are present in any of the webpage available on the internet.
4. “I want to sign up for a cheap music streaming service that has coverage of songs from
the 1980s, 1990s and 2000s”.
The provided web information need can be classified into the transactional class of web searches
as the main intent of the user is to perform the web-mediated activity of signing up for the music
streaming service which is also cheap and from 1980s, 1990s and 2000s.
b. Ellis Model
Document Page
2
INFORMATION STORAGE AND RETRIEVAL
The model described by Ellis for the information-seeking behavior and this as developed in
the year1984. The model which has been derived by Ellis contains eight generic characteristics
which are used for the recognition of the pattern for seeking the information. The main stages of
the mode are Starting, Chaining, Differentiating, Extracting, Verifying, Browsing, Monitoring
and Ending.
Starting: The starting means are generally employed by the users for the beginning of the
information seeking process such as asking very knowledgeable colleagues. The main activities
which are involved with the process are generally the identification of the sources of the interests
and the sources are included in the familiar resources.
Chaining: In this process the following leads are taken from the initial sources is generally
acknowledged as the backward or forward process. For instances, this is used when the
bibliographical tools which are required by them are generally unavailable to them.
Browsing: The process involves the semi-directed or the semi-structured areas for the potential
searches. For example the browsing takes place in the internet. For instances any user browses
just by looking into the table of contents and other parts of the report.
Differentiating:
The users are required to filter and select the filters by taking note of the nature and the quality of
the information which is offered to them.
Monitoring:
An instance in monitoring provides the user with the ability to monitor the web browser while
they are surfing the internet.
Document Page
3
INFORMATION STORAGE AND RETRIEVAL
Extracting:
The process of extraction involves the activities that would help in working through any type of
particular source in a systematic manner. For instances the field of physics and chemists in the
studies modeled on this fits the appropriate behavior.
Verifying:
The accuracy in checking the information which has been obtained from various type of sources
has been included in this area.
Ending:
The ending process helps in the conclusion of the processes for the systems.
c. The comparison of the Ellis model and the Campbell model has been described for
reference in this section. The Ellis model has been developed by Albert Ellis and the
Campbell model has been developed By Ian Campbell. The Campbell theory has been
established regarding the relationship between the core and the irrational beliefs and the
derivatives. On the other hand Ellis has emphasized on the Interactional connection in
between the core and irrational derivatives.
Answer to question number 2
a.
1. Morphological
These are the steps of natural language processing. These are separatewords of morphemes and
define the class of morphemes. The tasks difficulty depends on the morphology. The

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
4
INFORMATION STORAGE AND RETRIEVAL
morphology followed by English’s simpler than other languages. In language such as the
Turkish, an approach towards this is not a better way as there is more than thousands of entry
possible for word forms.
2. Lexical
It is basically a collection of the information that has been made about the language of the
categories that they belong to and also a structured collection of the lexical entries.
3. Syntactic
A Syntactic is a large body of natural language text and it is used for the accumulation of the
statics for the natural language text.
4. Semantics
The lexical semantics is used for understanding the meaning of individual words within the
context. The translation is done automatically in these parts of the natural processing language.
These area transfers the context from one human language to another. This is considered to be
the most difficult problem and need needs to be solved with the help of all the knowledge that
are possessed by humans. The next task performed by semantics is that it allows NER that is
named entity recognition. The other operations performed by semantics are natural language
generation, understanding of natural language, optical character reorganization, and question
answering and word sense disambiguation.
5. Discourse
The features that are supported by the discourse area in natural processing language are
automatic summarization. This produces a summary of a certain amount of text. These includes
Document Page
5
INFORMATION STORAGE AND RETRIEVAL
summary of articles based on the newspaper readings. Conference resolution is the next feature
that allows searching the entity and matches with the context and provides result accordingly.
For example anaphora resolution, this aims at matching the pronouns with the similar nouns
available. The next feature is the discourse analysis. This includes a number of related tasks. In
addition to this, there is a feature that aims at recognizing and classifying the texts associated
with the speech act.
6. Pragmatic
Pragmatic Analysis is part of the process of extracting information from text. Specifically, it’s
the portion that focuses on taking a structures set of text and figuring out what the actual
meaning was. It actually comes from the field of linguistics.
b.
1) Brill Tagger - NLTK python part-of-speech pos tagging
2) Porter Stemmer – Stemming Process
3) Wordnet - NLTK corpus reader
4) Noun Phrase Shallow Parser/Chunker - NLTK python part-of-speech pos tagging
Answer to question number 3
a. Topic models are tools that can be shared via suitable platform technology, allowing
users to benefit from the work of others, “long-tail”-style. Topic modeling is the process
of building and maintaining topic models.
b. The three type of topic models are:
Document Page
6
INFORMATION STORAGE AND RETRIEVAL
Correlated topic model
Dynamic Topic Model
Continuous Time Dynamic Topic Models
c. LDA is best used for longer texts, but it does seem to work for shorter texts- like tweets.
It would be suggested that the user use a bag-of-words approach for the features to use,
explore the openNLP/coreNLP tokenizes to get the tokens from the text. Tf-Idf weighing
is often used, It is also preferred to also use a GINI based weighing method on the LDA
results. The lda.gibbs.sampler function takes a document-term matrix as an input; it can
be made using the tm package.
d. The number of topics in the dataset are specified by the user(or based on some
distribution(Poisson) by sampling) which is subjective and doesn’t always highlight the
true distribution of topics. The topics are predicted based on the multinomial distribution
and then the words are predicted based on another multinomial distribution trained
specific to that topic. If the true structure is more complex than a multinomial distribution
or if the data to train isn’t sufficient, then it might underfit.
Answer to question number 4
a. The Information retrieval system would be able to play a very interesting role in current
search engine which provides the performance of the search for the User, from which
user can make sense of the fundamental and most vital data. The strategy of is generally
executed in QA framework for making users inquiry and a few stages are likewise
pursued for transformation of inquiries to question shape for finding a correct solution. In
calculated pursuit search engine translates the importance of users’ inquiry and the
connection among the ideas that archives contains regarding a specific space that

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
7
INFORMATION STORAGE AND RETRIEVAL
produces particular answers as opposed to giving rundown of answers. This constraint
might be overwhelmed by another web design known as semantic web which beat the
restriction of watchword based inquiry method called reasonable or semantic pursuit
strategy. Ontology is basically dependent on the Jena semantic web structure and
Semantic Information Retrieval System in which, User enters an info question which the
Standard Parser Triplet uses for the triplet extraction calculator.
b. The main advantage is the simplicity of the ontology and the main disadvantage of
ontology is that it is very space consuming.
c. The use of ontology is done so that the accuracy of the information retrieval system is
increased and as a result the efficiency of the system is increased.
d. There are a variety of candidates that have been proposed for the status of category. I've
mentioned two thus far: particular and property. Others include event or activity, process,
state of affairs, fact. For example, some ontologists propose that there are particular
objects, such as molecules, trees, people and so on, and events take place in these things,
and these elements are persistent throughout the changes that takes place within them.
The events also consist of entries that are complex in nature and those that are having the
event-properties. The particulars also have the capabilities to change which are the basic
property that they hold. Or at any rate, the persisting particulars do have capacities, and
these are exhibited by their activities. Other oncologists’, however, hold that particulars
are an illusion, reducible to strings of events or processes. So these event-ontology take
events or processes as basic.
Answer to question number 5
a. The formula for page rank is provided below:
Document Page
8
INFORMATION STORAGE AND RETRIEVAL
PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))
Explainations of the terms of page rank
PR(Tn) – All the pages provide a self-importance notation. For the first page that’s
“PR(T1)” in the web and it extends up to the “PR(Tn)” for the last page
C(Tn) – The pages generally spread their votes for the links that outgoing.
Number of outgoing links for page 1 = “C(T1)”,
Number of outgoing links for page n = “C(Tn)”,
PR(Tn)/C(Tn) – Number of votes for A.
d = 0.85
(1 - d) - The sum of the all the pages on the web would be 1.
Therefore the PR results to (1-d)
b.
The page rank of all the three documents is provided below:
PR(D1) = (1-d) + d (PR(D1)/C(D1) + ... + PR(D3)/C(D3))
PR(D2) = (1-d) + d (PR(D1)/C(D1) + ... + PR(D3)/C(D3))
PR(D3) = (1-d) + d (PR(D1)/C(D1) + ... + PR(D3)/C(D3))
Document Page
9
INFORMATION STORAGE AND RETRIEVAL
Bibliography
Cambria, E., & White, B. (2014). Jumping NLP curves: A review of natural language processing
research. IEEE Computational intelligence magazine, 9(2), 48-57.
Conneau, A., Schwenk, H., Barrault, L., & Lecun, Y. (2016). Very deep convolutional networks
for natural language processing. arXiv preprint.
Goldberg, Y. (2016). A primer on neural network models for natural language processing.
Journal of Artificial Intelligence Research, 57, 345-420.
Habash, N., Vogel, S., & Darwish, K. (2015). Proceedings of the Second Workshop on Arabic
Natural Language Processing. In Proceedings of the Second Workshop on Arabic Natural
Language Processing.
Kumar, A., Irsoy, O., Ondruska, P., Iyyer, M., Bradbury, J., Gulrajani, I., ... & Socher, R. (2016,
June). Ask me anything: Dynamic memory networks for natural language processing. In
International Conference on Machine Learning (pp. 1378-1387).
Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., & McClosky, D. (2014). The
Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd annual
meeting of the association for computational linguistics: system demonstrations (pp. 55-
60).
1 out of 10
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]