Python Programming: Research Report

Added on - 16 Sep 2019

  • Dissertation

    type

  • 7

    pages

  • 2904

    words

  • 149

    views

  • 0

    downloads

Showing pages 1 to 3 of 7 pages
Python Programming BackgroundBackgroundThe American presidential election is right around the corner, and both presidential candidates havebeen busy giving speeches. It is intuitively clear that Trump and Clinton have quite different stylesof speaking. In this exam, we will see if we can quantify such differences by analyzing the speechesusing the programming tools that we have learned through the course.The field of research of analyzing natural language is called Natural Language Processing (NLP).The techniques that we will use in this exam are quite basic, and we will omit some of the detailsthat one would normally use in NLP. However, despite this simplicity, we will see that we can getsome interesting results.Our analyses will be based on the following two data files, containing a selection of speeches givenby the candidates since their nomination some months ago at the national conventions for each ofthe parties. The files can be found here:http://people.binf.ku.dk/wb/data/clinton_speeches.txthttp://people.binf.ku.dk/wb/data/trump_speeches.txtWhen opening the files, you will see that each speech consists of a title line starting with "#",followed by a number of speech lines:# Greensboro, North Carolina on Oct. 14, 2016:Thank you. Wow. Thank you, everybody. Nice place. Plenty of room.In 25 days, we're going win the state of North Carolina, which I love, and we're going to winthe White House....Formal requirements:Format:You should hand in:1.A PDF file called exam.pdf containing the output from the different examsbelow2.A Python file called exam.py, containing the function definitions.3.A Python file called exam_test.py, containing test code for the individualquestions.These should be handed in as separate files (not zipped).Content:For the Python part, remember to comment your code, use meaningful variable names,and include docstrings for each function. Also, please limit yourself to the curriculum covered inthe course, that is, refrain from using list and dict comprehension, map, zip, reduce, filter andlambda for the Python part, and tools like awk for the Unix part. Also, please use only thoseexternal modules that are explicitly mentioned in the exam.
Question 1: Reading in the dataThe rest of the exam will be done in Python.1. In the exam.py file, write a function called read_speeches that takes a speech filename asargument, and returns a dictionary, where the keys are the title of the speeches in the file (asstrings), and the values are the corresponding speeches (as strings). You should remove the "#"character from the titles. Replace\n characters in the speech lines with a single space. Also, allspeech lines starting with "[" should be omitted. Remember to close the files after reading fromthem.In the exam_test.py file, call the read_speeches on both the clinton_speeches.txt and thetrump_speeches.txt files, and save the results in two variables called clinton_speeches_dict andtrump_speeches_dict, respectively. Print out the size (number of keys) in each of these twodictionaries.If you have problems completing this exam, and therefore do not have the requested dictionaries,please follow the following instructions in order to be able to complete the remainder of the exams:Download the following files:trump_speeches.jsonclinton_speeches.jsonInsert the following lines of code in your program:import jsonjson_file_trump = open("trump_speeches.json")json_file_clinton = open("clinton_speeches.json")trump_speeches_dict = json.load(json_file_trump)clinton_speeches_dict = json.load(json_file_clinton)json_file_trump.close()json_file_clinton.close()Note that these dictionaries are not exactly identical to the ones you get from solving the examyourself (so you cannot use them for verification purposes).2. In the exam.py file, write a function called merge_speeches that takes a list of speeches excludingtitles (i.e. list of strings) as argument, and returns a single string containing all speeches. As aseparator between the speech strings, use a single space.In the exam_test.py file, call the merge_speeches to merge all of Clinton's speeches and all ofTrump's speeches, and save the result in variables called clinton_speeches_all and thetrump_speeches_all, respectively (Hint: there is an easy way to get all of the values in a dictionaryas a list). Print the length of the two lists to screen.If you have problems completing this exam, and therefore do not have the requested stringscontaining all speeches, please follow the following instructions in order to be able to complete theremainder of the exams:Download the following files:
trump_speeches_all.txtclinton_speeches_all.txtInsert the following lines of code in your program:trump_file = open("trump_speeches_all.txt")clinton_file = open("clinton_speeches_all.txt")trump_speeches_all = trump_file.read()clinton_speeches_all = clinton_file.read()trump_file.close()clinton_file.close()Note that these strings are not exactly identical to the ones you get from solving the exam yourself(so you cannot use them for verification purposes).Question 2: CountingIn the exam.py file, write a function called count_words that takes a text (string) as input, andreturns the number of words in this text. You can assume that words are defined as anything that isseparated by whitespace.In the exam_test.py file, test the count_words function on thetrump_speeches_all variable, and print the result.If you have problems completing this exam, and therefore do not have the requested function,please follow the following instructions in order to be able to complete the remainder of the exams:Insert the following lines of code in your program:def count_words(text):"""Dummy function that always returns 50000"""return 50000You can use this function as a replacement for the correct function in the questions below.2.In the exam.py file, write a function called count_sentences that takes a text (string) as input, andreturns the number of sentences in this text. You can assume that sentences are defined as beingseparated by any of the following characters: ".","!" or "?". Hint: an easy way to solve this is usinga regular expression, combined with re.split.In the exam_test.py file, test the count_sentences function on the trump_speeches_all variable, andprint the result.If you have problems completing this exam, and therefore do not have the requested function,please follow the following instructions in order to be able to complete the remainder of the exams:Insert the following lines of code in your program:def count_sentences(text):"""Dummy function that always returns 5000"""return 5000You can use this function as a replacement for the correct function in the questions below.
desklib-logo
You’re reading a preview
card-image

To View Complete Document

Become a Desklib Library Member.
Subscribe to our plans

Unlock This Document