Text Analysis Program


Added on  2019/09/19

The assignment is to create a program that analyzes a text file, 'mytext.txt', and produces various statistics such as the number of words, sentences, vocabulary, lexical richness, word length frequency information, Flesch index, and grade level. The program should also ask the user for a specific word and generate its concordance. The output files will be 'stats.txt' and 'dictionary.csv'. The program must be written in Python 3.4.1 and submitted through Canvas.

TCSS 142 Spring 2017, Programming Assignment 1
Given the problem statement below, complete the following
Program schedule chart (5 pts)
Algorithm written in English / pseudocode (10 pts)
Test plan (15 pts)
Program that runs correctly (60 pts) – in order to receive credit, your code MUST run
and complete at least some of the steps; comment out the code that shows your
attempt but does not work for possible partial credit
Coding style and comments (10 points)
meaningful identifiers, indentation, etc.
header comment with your name and course info at the top of the file
comments explaining your code
Extra credit: algorithm as a flowchart OR code in methods (extra credit 10 pts)
You are NOT allowed to help one another with this program or use somebody else's code –
check the rules listed on the syllabus. However, you may seek help from TAs, CSS mentors, and
Problem Statement:
For this assignment you are to create a program that analyzes a text file and producs the
following statistics:
number of words – any sequence of alphanumeric characters (length >= 1) with the
beginning and ending non-alphanumeric character removed counts as a word
number of sentences – any sequence of words ending in a period, question mark,
exclamation point, or semicolon
text's vocabulary – all words of length 3 or more appearing in a file other than the,
and; the vocabulary should not contain duplicate words regardless of their
capitalization (i.e. home and Home are considered to be the same word)
lexical richness of the text – percentage of unique words in the text (ratio of vocab words
to all words in the text)
word length frequency information for words of any length: how many words of length 2,
length 3, length 4, etc appear in the vocabulary.
text Flesch index and the text grade level – explained below
the concordance of the word entered by the user – explained below
You should use strings and lists, as well as built-in string and list methods described in the
textbook, to solve this problems. However, you may not use other Python constructs that have
not been discussed in class so far. If in doubt, ask instructors for clarification.
Flesch index and the grade level
The Flesch/Flesch–Kincaid readability tests are readability tests designed to indicate
comprehension difficulty when reading a passage of contemporary academic English. The index
is based on the average number of syllables per word and the average number of words per
sentence in a piece of text. Index scores usually range from 0 to 100, and are calculated using the
following formula:
Definitions of terms such as word and sentence have been already provided above. As far as
syllables are concerned, we will use the following rules:
each vowel (aeiou) is considered a syllable unless it occurs in -es, -ed, or -e ending

any word of length 3 or less counts as one syllable
The grade level is calculated using the following formula:
Note that the definitions of syllables, words, and sentence we are using are approximations of the
actual syllables, words, and sentences in any natural language.
A concordance is an alphabetical list of the principal words used in a book or body of work, with
their immediate contexts. For the purpose of this assignment, your program is to ask the user for
one word only and produce all the lines in which the word appears in the original text, for
example, if the word nation is entered when processing the text of Gettysburg Address, it results
in the following concordance phrases:
continent a new nation, conceived in liberty and dedicated to the
a great civil war, testing whether that nation or any nation so
lives that that nation might live. it is altogether fitting and
nation under god shall have a new birth of freedom, and that
Your program is to perform text analysis of an input data file called mytext.txt. In addition,
your program needs to ask the user for the word for which the word concordance is to be
Your program is to generate two output files. The first file should be called stats.txt and
contain all statistics listed above other than the vocabulary or concordance – sample provided on
Canvas. The second file should be called dictionary.csv and contain the text's vocabulary
listed alphabetically in the first column – sample provided on Canvas. Concordance should be
printed to the screen (as shown above).
A flowchart is a graphical representation of an algorithm and two good examples are provided in
the textbook on p. 126 and p. 136. Note that the flowchart elements include a rectangle to
indicate calculations, a diamond to indicate a decision point, and a parallelogram to indicate
input/output. In order to receive credit, you need to provide a detailed flowchart.
Program Submission
If you want your assignment to be graded, it has to be compatible with our platform, namely
Python 3.4.1 The source code is to be called <yourNetid> (replace with
your netid).
On or before the due date, use the link posted in Canvas next to Project 1 to submit your
code and all associated documentation. Make sure you know how to do that before the due date
since late assignments will not be accepted. Valid documentation file formats are: doc, docx, rtf,
Program Schedule
Date Completed
Milestone Planned Actual
Assignment Recieved
Requirements understood; detailed
specifications recorded
Top level design complete
All levels of top-down design complete
Coding complete(clean compile)
Testing planned
Testing complete
Program ready to turn in; all documentation
Assignment turned in
Test Plan
Reason for test case Input Values Expected output
