Python Assignment: Spam Confidence Calculation from Text Files

Verified

Added on  2019/09/13

|1
|438
|428
Homework Assignment
AI Summary
This Python assignment focuses on developing a program to calculate spam confidence from text files. The program reads a specified file, identifies lines containing "X-DSPAM-Confidence:", and extracts the corresponding floating-point number representing the confidence level. It calculates the average spam confidence by summing these values and dividing by the number of relevant lines. The assignment requires students to handle file input, string manipulation, and numerical calculations. Students are expected to test their code with various inputs, including error handling for invalid file names and different text files (mbox.txt and mbox-short.txt). The assignment also emphasizes proper commenting and documentation of the code, ensuring the program's readability and maintainability. The task includes creating a Python program, testing it on various inputs, and documenting the results in a Word document with screenshots.
Document Page
TURN IN #1
Write a program to prompt for a file name, and then read through the file and look for lines of the form:
X-DSPAM-Confidence: 0.8475
When you encounter a line that starts with “X-DSPAM-Confidence:” pull apart the line to extract the
floating-point number on the line. Count these lines and then compute the total of the spam confidence
values from these lines. When you reach the end of the file, print out the average spam confidence.
Enter the file name: mbox.txt
Average spam confidence: 0.894128046745
Enter the file name: mbox-short.txt
Average spam confidence: 0.750718518519
Test your file on the mbox.txt and mbox-short.txt files
HINT:
1. Download the two text files: mbox.txt and mbox-short.txt from
http://www.pythonlearn.com/code3/ to your local machine. For ease, ensure these files reside
in the same folder as the .py file for this assignment.
2. Begin writing your code by prompting the user for the file name. Use a try-except block to exit
with a user-friendly error message if there is an error opening the file name specified.
3. Once the file is opened, use an iterative loop (e.g. “for” or “while”) to traverse each line of the
fi
le.
4. (A quick manual exploration of mbox text files reveals that the number representing spam
confidence is found at the end of the line). In each line, find the pattern “X-DSPAM-
Confidence:”. If this is found, extract the portion of the line after this pattern until the end of the
line. “find”, string extraction and “strip” functions are useful here.
5. Convert the numeric part extracted from the line into a float.
6. When the program has finished traversing each line in the specified file, total the number of
lines that had the pattern and compute average spam confidence.
7. Note: In your calculation for average spam confidence, do NOT count the lines that did that not
contain the pattern. Be sure to comment your program adequately!
8. Remember your Python code will be graded according to our class Rubric.
9. Submit your Python code file. Name it “XXXX-dspam.py” where XXXX is your name.
10. Submit also a Word document showing screen shots of the various testing conditions. Good
programmers test all cases, so you should make sure you show tests for erroneous inputs, the
mbox-short.txt, and the mbox.txt files.
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser