Text Analysis Program


Added on  2019-09-19

4 Pages1061 Words515 Views
TCSS 142 Spring 2017, Programming Assignment 1Given the problem statement below, complete the followingProgram schedule chart (5 pts)Algorithm written in English / pseudocode (10 pts)Test plan (15 pts)Program that runs correctly (60 pts) – in order to receive credit, your code MUST run and complete at least some of the steps; comment out the code that shows your attempt but does not work for possible partial creditCoding style and comments (10 points)meaningful identifiers, indentation, etc.header comment with your name and course info at the top of the filecomments explaining your codeExtra credit: algorithm as a flowchart OR code in methods (extra credit 10 pts)You are NOT allowed to help one another with this program or use somebody else's code – check the rules listed on the syllabus. However, you may seek help from TAs, CSS mentors, and instructors. Problem Statement:For this assignment you are to create a program that analyzes a text file and producs the following statistics:number of words – any sequence of alphanumeric characters (length >= 1) with the beginning and ending non-alphanumeric character removed counts as a wordnumber of sentences – any sequence of words ending in a period, question mark, exclamation point, or semicolontext's vocabulary – all words of length 3 or more appearing in a file other than the, and; the vocabulary should not contain duplicate words regardless of their capitalization (i.e. home and Home are considered to be the same word)lexical richness of the text – percentage of unique words in the text (ratio of vocab words to all words in the text)word length frequency information for words of any length: how many words of length 2,length 3, length 4, etc appear in the vocabulary.text Flesch index and the text grade level – explained belowthe concordance of the word entered by the user – explained belowYou should use strings and lists, as well as built-in string and list methods described in the textbook, to solve this problems. However, you may not use other Python constructs that have not been discussed in class so far. If in doubt, ask instructors for clarification.Flesch index and the grade levelThe Flesch/Flesch–Kincaid readability tests are readability tests designed to indicate comprehension difficulty when reading a passage of contemporary academic English. The index is based on the average number of syllables per word and the average number of words per sentence in a piece of text. Index scores usually range from 0 to 100, and are calculated using thefollowing formula:Definitions of terms such as word and sentence have been already provided above. As far as syllables are concerned, we will use the following rules:each vowel (aeiou) is considered a syllable unless it occurs in -es, -ed, or -e ending
Text Analysis Program_1

any word of length 3 or less counts as one syllableThe grade level is calculated using the following formula:Note that the definitions of syllables, words, and sentence we are using are approximations of theactual syllables, words, and sentences in any natural language.ConcordanceA concordance is an alphabetical list of the principal words used in a book or body of work, withtheir immediate contexts. For the purpose of this assignment, your program is to ask the user for one word only and produce all the lines in which the word appears in the original text, for example, if the word nation is entered when processing the text of Gettysburg Address, it results in the following concordance phrases: continent a new nation, conceived in liberty and dedicated to thea great civil war, testing whether that nation or any nation solives that that nation might live. it is altogether fitting andnation under god shall have a new birth of freedom, and thatInputYour program is to perform text analysis of an input data file called mytext.txt. In addition, your program needs to ask the user for the word for which the word concordance is to be generated. OutputYour program is to generate two output files. The first file should be called stats.txt and contain all statistics listed above other than the vocabulary or concordance – sample provided on Canvas. The second file should be called dictionary.csv and contain the text's vocabulary listed alphabetically in the first column – sample provided on Canvas. Concordance should be printed to the screen (as shown above).FlowchartA flowchart is a graphical representation of an algorithm and two good examples are provided in the textbook on p. 126 and p. 136. Note that the flowchart elements include a rectangle to indicate calculations, a diamond to indicate a decision point, and a parallelogram to indicate input/output. In order to receive credit, you need to provide a detailed flowchart.Program SubmissionIf you want your assignment to be graded, it has to be compatible with our platform, namely Python 3.4.1 The source code is to be called <yourNetid>_project1.py (replace with your netid).On or before the due date, use the link posted in Canvas next to Project 1 to submit your code and all associated documentation. Make sure you know how to do that before the due date since late assignments will not be accepted. Valid documentation file formats are: doc, docx, rtf, pdf.Program ScheduleDate Completed
Text Analysis Program_2

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Implementing a Line Editor

Programming and Data Structure Challenges