CE314/CE887 Assignment 2: Parsing and Word Similarity Analysis

Verified

Added on  2023/05/28

|4
|1166
|384
Homework Assignment
AI Summary
This document provides a comprehensive solution to an assignment focused on parsing and word similarity within the field of Natural Language Processing (NLP). The assignment utilizes the NLTK (Natural Language Toolkit) library in Python to analyze text, engineer grammars, and implement parsing techniques. The solution includes detailed explanations and code for grammar engineering, exploring different parsing methods like Chart Parsing and Shift-Reduce Parsing, and addressing parsing ambiguities. Furthermore, it delves into word similarity analysis using WordNet, implementing methods to calculate similarity between words and identify closest words based on their semantic relationships. The document covers various aspects of text processing, including text analysis, lexical categories, and sentence structure, providing a complete overview of the assignment's requirements and solutions.
Document Page
Answer for Question1:
Text processing and analysis plays a vital role in text analytics because
80% of data in the totally available data are in the form of text and logs. The
NLTK- natural language tool kit merely helps to analyse and understand text
patterns. Examining their structure, consigning lexical categories, and analyze
meanings of raw text will help to reveal some interesting patterns in them.
Defining formal grammar helps to understand the structure of sentences. Syntax
trees used to represent the structure and patterns of sentences. Parsing the sentence
with syntax tree help to automatically analyse the sentence.
To describe whether a given sequence can be allocated a particular
constituent structure ,A context-free phrase structure grammar (CFG) is a proper
model used. The structure of string is build using parsing algorithms, there are
several algorithms presents, such as Top-Down Recursive Descent and Bottom-Up
Shift-Reduce , etc,. there are some pitfalls in earlier parser methods , the recursive
descent parser unable to handle the left recursive production
Ex: NP->NP PP
And shift reduce parser is not a valid parser also not suitable for search purposes.
Chart Parsing:
A dynamic programming technique used to parse the given text which
stores intermediate outcomes and reuses when needed. In syntactic parsing
method , partial solutions are stored and reused to produce an efficient and
complete solution.
Program and outcomes:
import nltk
import nltk, re, pprint
from __future__ import print_function, division,
unicode_literals
import itertools
import re
import warnings
from functools import total_ordering
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Nltk is –natural language tool kit available in python for natural language
processing. The parsing of given sentences done in Jupiter notebook python of
anaconda python version 3.7. to parse the given sentences the nltk pakage should
be imported into jupyter notebook. And required packages are imports as given in
the above code.
Grammar for the given sentence is defined as in the above said format. Noun
phrase, verb phrase, proper noun, noun adjectives, prepositions are properly
defended before parsing. The sentence parser will throw an error if the term in
grammar is missing or mismatch.
The given sentence is assingned to some variables and splitted for parsing.
grammar = nltk.CFG.fromstring("""S -> NP VP
NP -> Det Nom | PropN | NP PP
Nom -> Adj Nom | N
VP -> V NP | V S | VP PP
PP -> P NP
PropN -> "Bill"|"Bob"
Det -> "the" | "a" | "an"
N -> "bear" | "squirrel" | "park"| "block" | "table"|"river"
Adj -> "angry" | "frightened" |"furry"
V -> "chased" | "saw" | "put" | "eats" | "eat"|"chase"
P -> "on"|"in"|"along" """)
sent1 = "put the block on the table".split()
sent2 ="Bob chased a bear in the park along the river". split()
sent3=" Bill saw Bob chase the angry furry dog". split()
parser = nltk.ChartParser(grammar)
parser
Document Page
The parser is defined using parese variable as nltk chart parser with the defined
grammar. The specified grammar in ‘grammar’ is used for chart parsing the
sentence.
Sentence is assigned for parsing and the tree is printed as follows:
<nltk.parse.chart.ChartParser at 0xb13bf835f8>
for tree in parser.parse(sent2):
print (tree)
tree
(S
(NP (PropN Bob))
(VP
(V chased)
(NP
(NP
(NP (Det a) (Nom (N bear)))
(PP (P in) (NP (Det the) (Nom (N park)))))
(PP (P along) (NP (Det the) (Nom (N river)))))))
(S
(NP (PropN Bob))
(VP
(V chased)
(NP
(NP (Det a) (Nom (N bear)))
(PP
(P in)
(NP
(NP (Det the) (Nom (N park)))
(PP (P along) (NP (Det the) (Nom (N river)))))))))
(S
(NP (PropN Bob))
(VP
(VP
(VP (V chased) (NP (Det a) (Nom (N bear))))
(PP (P in) (NP (Det the) (Nom (N park)))))
(PP (P along) (NP (Det the) (Nom (N river))))))
(S
(NP (PropN Bob))
(VP
(VP
(V chased)
(NP
(NP (Det a) (Nom (N bear)))
(PP (P in) (NP (Det the) (Nom (N park))))))
(PP (P along) (NP (Det the) (Nom (N river))))))
(S
(NP (PropN Bob))
(VP
(VP (V chased) (NP (Det a) (Nom (N bear))))
(PP
(P in)
(NP
(NP (Det the) (Nom (N park)))
(PP (P along) (NP (Det the) (Nom (N river))))))))
Document Page
Chart parse heuristically decides the Fundamental Rule. This rule is used to
syndicate an partial edge that's expcting a nonterminal B with a following,
complete edge whose left hand side is B.
Fundamental Rule
If the chart contains the edges
[A → α • B β , (i, j)]
[B → γ • , (j, k)]
then add the new edge
[A → α B • β , (i, k)]
where α, β, and γ are (possibly empty) sequences
of terminals or non-terminals
chevron_up_icon
1 out of 4
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]