A Detailed Report on Named Entity Recognition and Viterbi Algorithm

Verified

Added on  2021/06/17

|5
|1287
|111
Report
AI Summary
This report delves into the realm of Named Entity Recognition (NER), a crucial task in natural language processing, focusing on the application of the Viterbi algorithm. The report begins by outlining the significance of NER in predicting tag sequences, essential for efficient machine learning and software systems. It highlights the Viterbi algorithm's dynamic processing capabilities, efficient memory usage, and its role in providing optimal results. The report further explores the methods involved, emphasizing the algorithm's ability to analyze and predict language in input files, addressing the challenges in language processing. It details the steps involved in the Viterbi algorithm, including the generation of paths, sequences, and observations, and the use of two-dimensional matrices for probability calculations. The analysis section classifies named entities and discusses the use of various analytical models, such as Markov models. The report concludes with an example of a sample input pattern and discusses the time-consuming nature of the natural language processing task, emphasizing the importance of programming language features in implementing the Viterbi algorithm for processing named entities as sequenced tags, implemented in Java, and the algorithm's runtime considerations. The report cites several references to support its findings.
Document Page
1. System:
The prediction of tag sequence is important for learning model and it is also used to perform
machine learning efficiently, the prediction of such sequence is also known as named entity
recognition. Many software systems were created for such discipline, to provide best
prediction of tag sequence (Shimazu & Okumura, 2010). There is an algorithm known as
Viterbi is used to provide the best result in tag sequence prediction. It contain the feature of
dynamic processing, so the inputs are processed at run time, memory usage is also efficient
while using this algorithm. The implementation part creates some complexity during the
calculation process. The sequence can be identified in the calculation process, and then the
result will be the predicted sequence tag (Nagao, 2014). The system which is named as
Named Entity Recognition many be used on the unstructured text contains many number of
uses. The main challenge of language processing is designing and implementing.
2. Method:
Natural language processing is a useful process to analyze and predict the language of the
input file. And it is helpful in artificial intelligence system. There is no application satisfy all
the needs of the language processing (Rui, 2015).. The algorithm named as Viterbi which
used for predicting the set of tags in the sequence of given data, it contains many calculation
to perform prediction task (KIM & CHOI, 2017). It contains the dynamic processing features,
so the memory management is efficient in this algorithm. It can able to find hidden states of
the sequence of data which is known as Viterbi path. Finally the outcome will be the
predicted tag sequence (Bae, 2014). The algorithm analyzes the given data first and finds the
Viterbi path then find out the predicted sequence as result. The steps involved in the
algorithm are listed below.
The algorithm generates a path like X = (x1, x2, x3….xn), The sequence will be (Cn
€ G={g1, g2,…….,gk}).
Then the sequence creates an observation like D=(d1, d2, ………,dn) with (Zn € B
={b1, b2, b3………,bn}.
And the algorithm need 2 two-dimensional matrix like A*R
1. Each element in the table T1 need to store the probability of the most
likely path and generate Y=(y1, y2, y3,……., yi).
2. The tale T2 need to have Qj-1 of the path of Q bar for j, 2<=j<=T
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Table data are increased in the order of A. j + i.
Input
The input to algorithm is observation space B = { b1, b2, b3 ,….., bn}.It must be in
the form of array to process the tags. And part the tags as per the calculation process.
And state space S = {s1, s2,……,sk} is used to provide the aspect of the input
sequence.
To create the two dimensional matrix the user need to provide the information of array as
input. And the number of matrix is two (Nguyen, 2003) .
The rows and columns of the matrix may be provided as input, and it will be
increased during the processing.
3. Analysis:
Named entity recognition otherwise known as entity identification. NER usually refers entity
extraction. It also known as subtask of information extraction(Viterbi algorithm, 2012).
Example for named entities include people, locations, geopolitical bodies, events etc. (Bae,
2014). we can classify named entities into three top level.
Entity names
Temporal expressions
Number expressions
Named entities are chunks of text so that it requires parsing or chunking prediction model in
order to check whether a group of tokens belong to same entity or not (KIM & CHOI, 2017)
. There are several algorithms available for chunking namely Viterbi algorithms and beam
search algorithms.
Inference also involved to check a named entity when there is a chance for ambiguity. For
example loss angel may refer a name or location (Manning, 2015)
Document Page
. So we have to analyses the named entities(Viterbi, 2010). For this purpose several analyses
models are used like maximum entropy models, hidden Markova models, then other
statistical methods.
Markova Models:
4.Example:
Sample input pattern: 1011010100
Time 0 1 2 3 4 5 6 7 8 9 10
Input 1 0 1 1 0 1 0 1 0 0
Encoded output11 10 00 01 01 00 10 00 10 11
bit pairs
Soft decision -4,-4 -4,3 3, 3 3, -4 3, -4 3, 3 -4, 3 3, 3 -4, 3 -4, -4
(ideal)
Document Page
5. Result:
The natural language processing is a time consuming task, because it take more time for both
design and implementation. The designing part contains many calculations and prediction
results, the calculation can be performed manually but need more time. While implementing
the algorithm, the user needs to consider the features of the programming language which is
helpful to provide better result within reasonable amount of time. Here the Viterbi algorithm
is used for processing the named entity as set sequenced tags. And this algorithm
implemented in the object oriented programming language which is java. The run time of the
algorithm is more than the normal timing. Because the input processing and analyzing take
some time as well as the calculation process also takes much time to provide the result.
6. Reference:
Bae, K. (2014). An Efficient Named Entity and Topic Word Recognition Method Based on
Named Entity Pattern in a Natural Language Interface. The Journal Of Korean Institute Of
Information Technology, 12(1). doi: 10.14801/kiitr.2014.12.1.121
Book On Demand. (2012). Viterbi algorithm. [Place of publication not identified].
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
KIM, E., & CHOI, K. (2017). Entity Summarization Based on Entity Grouping in
Multilingual Projected Entity Space. IEICE Transactions On Information And
Systems, E100.D(9), 2138-2146. doi: 10.1587/transinf.2016edp7235
Manning, L. (2015). No Identity Without an Entity. Pacific Philosophical Quarterly, 96(2),
279-305. doi: 10.1111/papq.12074
Minh Quang, N. (2015). VITERBI ALGORITHM FOR ENGLISH - VIETNAMESE
SENTENCE RETRIEVAL IN EBMT. Journal Of Science, Educational Science, 60(7A),
3-9. doi: 10.18173/2354-1075.2015-0047
Nagao, M. (2014). Special Issue: “Collection of Best Annual Papers” Organized for the 20th
Anniversary of the Association for Natural Language Processing. Journal Of Natural
Language Processing, 21(4), 617-618. doi: 10.5715/jnlp.21.617
Nguyen, H. (2003). The expectation-maximization Viterbi algorithm for blind channel
identification and equalization.
Rui, Q. (2015). The Game Between Accounting Oversight Entity and Enterprise Accounting
Entity. Journal Of Finance And Accounting, 3(4), 97. doi: 10.11648/j.jfa.20150304.15
chevron_up_icon
1 out of 5
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]