logo

Research on Python Programming

8 Pages3043 Words349 Views
   

Added on  2019-09-13

Research on Python Programming

   Added on 2019-09-13

ShareRelated Documents
Python Programming / Python Programming BackgroundBackgroundThe American presidential election is right around the corner, and both presidential candidates have been busy giving speeches. It is intuitively clear that Trump and Clinton have quite different styles of speaking. In this exercise, we will see if we can quantify such differences by analyzing the speeches using the programming tools that we have learned through the course.The field of research of analyzing natural language is called Natural Language Processing (NLP). The techniques that we will use in this exercise are quite basic, and we will omit some of the details that one would normally use in NLP. However, despite this simplicity, we will see that we can get some interesting results.Our analyses will be based on the following two data files, containing a selection of speeches given by the candidates since their nomination some months ago at the national conventions for each of the parties. The files can be found here:http://people.binf.ku.dk/wb/data/clinton_speeches.txt(Links to an external site.)http://people.binf.ku.dk/wb/data/trump_speeches.txt(Links to an external site.)When opening the files, you will see that each speech consists of a title line starting with "#", followed by a number of speech lines:# Greensboro, North Carolina on Oct. 14, 2016:Thank you. Wow. Thank you, everybody. Nice place. Plenty of room.In 25 days, we're going win the state of North Carolina, which I love, and we're going to win the White House....Formal requirements:Format:You should hand in:1.A PDF file called exercise.pdf containing the output from the different exercises below2. A Python file called exercise.py, containing the function definitions.3.A Python file called exercise_test.py, containing test code for the individual questions.These should be handed in as separate files (not zipped).
Research on Python Programming_1
Content:For the Python part, remember to comment your code, use meaningful variable names, and include docstrings for each function. Also, please limit yourself to the curriculum covered in the course, that is, refrain from using list and dict comprehension, map, zip, reduce, filter and lambda for the Python part, and tools like awk for the Unix part. Also, please use only those external modules that are explicitly mentioned in the exercise.Question 1: Reading in the dataThe rest of the exercise will be done in Python.In the exercise.py file, write a function called read_speeches that takes a speech filename as argument, and returns a dictionary, where the keys are the title of the speeches in the file (as strings), and the values are the corresponding speeches (as strings). You should remove the "#" character from the titles. Replace\n characters in the speech lines with a single space. Also, all speech lines starting with "[" should be omitted. Remember to close the files after reading from them.In the exercise_test.py file, call the read_speeches on both the clinton_speeches.txt and the trump_speeches.txt files, and save the results in two variables called clinton_speeches_dict and trump_speeches_dict, respectively. Print out the size (number of keys) in each of these two dictionaries.If you have problems completing this exercise, and therefore do not have the requested dictionaries, please follow the following instructions in order to be able to complete the remainderof the exercises:Download the following files:trump_speeches.json(Links to an external site.)clinton_speeches.json(Links to an external site.)Insert the following lines of code in your program:import jsonjson_file_trump = open("trump_speeches.json")json_file_clinton = open("clinton_speeches.json")trump_speeches_dict = json.load(json_file_trump)clinton_speeches_dict = json.load(json_file_clinton)json_file_trump.close()json_file_clinton.close()Note that these dictionaries are not exactly identical to the ones you get from solving the exercise yourself (so you cannot use them for verification purposes).2. In the exercise.py file, write a function called merge_speeches that takes a list of speeches excluding titles (i.e. list of strings) as argument, and returns a single string containing all speeches. As a separator between the speech strings, use a single space.In the exercise_test.py file, call the merge_speeches to merge all of Clinton's speeches and all of Trump's speeches, and save the result in variables called clinton_speeches_all and the trump_speeches_all, respectively (Hint: there is an easy way to get all of the values in a dictionary as a list). Print the length of the two lists to screen.
Research on Python Programming_2
If you have problems completing this exercise, and therefore do not have the requested strings containing all speeches, please follow the following instructions in order to be able to complete the remainder of the exercises:Download the following files:trump_speeches_all.txt(Links to an external site.)clinton_speeches_all.txt(Links to an external site.)Insert the following lines of code in your program:trump_file = open("trump_speeches_all.txt")clinton_file = open("clinton_speeches_all.txt")trump_speeches_all = trump_file.read()clinton_speeches_all = clinton_file.read()trump_file.close()clinton_file.close()Note that these strings are not exactly identical to the ones you get from solving the exercise yourself (so you cannot use them for verification purposes).Question 2: CountingIn the .py file, write a function called count_words that takes a text (string) as input, and returns the number of words in this text. You can assume that words are defined as anything that is separated by whitespace.In the exercise_test.py file, test the count_words function on the trump_speeches_all variable, and print the result.If you have problems completing this exercise, and therefore do not have the requested function, please follow the following instructions in order to be able to complete the remainder of the exercises:Insert the following lines of code in your program:def count_words(text): """Dummy function that always returns 50000""" return 50000 You can use this function as a replacement for the correct function in the questions below.2. In the exercise.py file, write a function called count_sentences that takes a text (string) as input,and returns the number of sentences in this text. You can assume that sentences are defined as being separated by any of the following characters: ".", "!" or "?". Hint: an easy way to solve this isusing a regular expression, combined with re.split.In the exercise_test.py file, test the count_sentences function on the trump_speeches_all variable,and print the result.If you have problems completing this exercise, and therefore do not have the requested function, please follow the following instructions in order to be able to complete the remainder of the exercises:
Research on Python Programming_3

End of preview

Want to access all the pages? Upload your documents or become a member.