logo

Python Programming: Research Report

   

Added on  2019-09-16

7 Pages2904 Words737 Views
Python Programming BackgroundBackgroundThe American presidential election is right around the corner, and both presidential candidates havebeen busy giving speeches. It is intuitively clear that Trump and Clinton have quite different styles of speaking. In this exam, we will see if we can quantify such differences by analyzing the speechesusing the programming tools that we have learned through the course.The field of research of analyzing natural language is called Natural Language Processing (NLP). The techniques that we will use in this exam are quite basic, and we will omit some of the details that one would normally use in NLP. However, despite this simplicity, we will see that we can get some interesting results.Our analyses will be based on the following two data files, containing a selection of speeches given by the candidates since their nomination some months ago at the national conventions for each of the parties. The files can be found here:http://people.binf.ku.dk/wb/data/clinton_speeches.txthttp://people.binf.ku.dk/wb/data/trump_speeches.txtWhen opening the files, you will see that each speech consists of a title line starting with "#", followed by a number of speech lines:# Greensboro, North Carolina on Oct. 14, 2016:Thank you. Wow. Thank you, everybody. Nice place. Plenty of room.In 25 days, we're going win the state of North Carolina, which I love, and we're going to win the White House....Formal requirements:Format:You should hand in:1.A PDF file called exam.pdf containing the output from the different exams below2.A Python file called exam.py, containing the function definitions.3.A Python file called exam_test.py, containing test code for the individual questions.These should be handed in as separate files (not zipped).Content: For the Python part, remember to comment your code, use meaningful variable names, and include docstrings for each function. Also, please limit yourself to the curriculum covered in the course, that is, refrain from using list and dict comprehension, map, zip, reduce, filter and lambda for the Python part, and tools like awk for the Unix part. Also, please use only those external modules that are explicitly mentioned in the exam.
Python Programming: Research Report_1
Question 1: Reading in the dataThe rest of the exam will be done in Python.1. In the exam.py file, write a function called read_speeches that takes a speech filename as argument, and returns a dictionary, where the keys are the title of the speeches in the file (as strings), and the values are the corresponding speeches (as strings). You should remove the "#" character from the titles. Replace\n characters in the speech lines with a single space. Also, all speech lines starting with "[" should be omitted. Remember to close the files after reading from them.In the exam_test.py file, call the read_speeches on both the clinton_speeches.txt and the trump_speeches.txt files, and save the results in two variables called clinton_speeches_dict and trump_speeches_dict, respectively. Print out the size (number of keys) in each of these two dictionaries.If you have problems completing this exam, and therefore do not have the requested dictionaries, please follow the following instructions in order to be able to complete the remainder of the exams:Download the following files:trump_speeches.jsonclinton_speeches.jsonInsert the following lines of code in your program:import jsonjson_file_trump = open("trump_speeches.json")json_file_clinton = open("clinton_speeches.json")trump_speeches_dict = json.load(json_file_trump)clinton_speeches_dict = json.load(json_file_clinton)json_file_trump.close()json_file_clinton.close()Note that these dictionaries are not exactly identical to the ones you get from solving the exam yourself (so you cannot use them for verification purposes).2. In the exam.py file, write a function called merge_speeches that takes a list of speeches excludingtitles (i.e. list of strings) as argument, and returns a single string containing all speeches. As a separator between the speech strings, use a single space.In the exam_test.py file, call the merge_speeches to merge all of Clinton's speeches and all of Trump's speeches, and save the result in variables called clinton_speeches_all and the trump_speeches_all, respectively (Hint: there is an easy way to get all of the values in a dictionary as a list). Print the length of the two lists to screen.If you have problems completing this exam, and therefore do not have the requested strings containing all speeches, please follow the following instructions in order to be able to complete the remainder of the exams:Download the following files:
Python Programming: Research Report_2
trump_speeches_all.txtclinton_speeches_all.txtInsert the following lines of code in your program:trump_file = open("trump_speeches_all.txt")clinton_file = open("clinton_speeches_all.txt")trump_speeches_all = trump_file.read()clinton_speeches_all = clinton_file.read()trump_file.close()clinton_file.close()Note that these strings are not exactly identical to the ones you get from solving the exam yourself (so you cannot use them for verification purposes).Question 2: CountingIn the exam.py file, write a function called count_words that takes a text (string) as input, and returns the number of words in this text. You can assume that words are defined as anything that is separated by whitespace.In the exam_test.py file, test the count_words function on the trump_speeches_all variable, and print the result.If you have problems completing this exam, and therefore do not have the requested function, please follow the following instructions in order to be able to complete the remainder of the exams:Insert the following lines of code in your program:def count_words(text): """Dummy function that always returns 50000""" return 50000 You can use this function as a replacement for the correct function in the questions below.2.In the exam.py file, write a function called count_sentences that takes a text (string) as input, and returns the number of sentences in this text. You can assume that sentences are defined as being separated by any of the following characters: ".", "!" or "?". Hint: an easy way to solve this is usinga regular expression, combined with re.split.In the exam_test.py file, test the count_sentences function on the trump_speeches_all variable, and print the result.If you have problems completing this exam, and therefore do not have the requested function, please follow the following instructions in order to be able to complete the remainder of the exams:Insert the following lines of code in your program:def count_sentences(text): """Dummy function that always returns 5000""" return 5000 You can use this function as a replacement for the correct function in the questions below.
Python Programming: Research Report_3

End of preview

Want to access all the pages? Upload your documents or become a member.