Trusted by 2+ million users,
1000+ happy students everyday
1000+ happy students everyday
Showing pages 1 to 3 of 8 pages
Python Programming / Python Programming BackgroundBackgroundThe American presidential election is right around the corner, and both presidentialcandidates have been busy giving speeches. It is intuitively clear that Trump andClinton have quite different styles of speaking. In this exercise, we will see if we canquantify such differences by analyzing the speeches using the programming toolsthat we have learned through the course.The field of research of analyzing natural language is called Natural LanguageProcessing (NLP). The techniques that we will use in this exercise are quite basic,and we will omit some of the details that one would normally use in NLP. However,despite this simplicity, we will see that we can get some interesting results.Our analyses will be based on the following two data files, containing a selection ofspeeches given by the candidates since their nomination some months ago at thenational conventions for each of the parties. The files can be found here:•http://people.binf.ku.dk/wb/data/clinton_speeches.txt(Links to an external site.)•http://people.binf.ku.dk/wb/data/trump_speeches.txt(Links to an external site.)When opening the files, you will see that each speech consists of a title line startingwith "#", followed by a number of speech lines:# Greensboro, North Carolina on Oct. 14, 2016:Thank you. Wow. Thank you, everybody. Nice place. Plenty of room.In 25 days, we're going win the state of North Carolina, which I love, and we're going towin the White House....Formal requirements:Format:You should hand in:1.A PDF file called exercise.pdf containing the output from the differentexercises below2.A Python file called exercise.py, containing the function definitions.3.A Python file called exercise_test.py, containing test code for the individualquestions.These should be handed in as separate files (not zipped).
Content:For the Python part, remember to comment your code, use meaningful variablenames, and include docstrings for each function. Also, please limit yourself to the curriculumcovered in the course, that is, refrain from using list and dict comprehension, map, zip, reduce,filter and lambda for the Python part, and tools like awk for the Unix part. Also, please use onlythose external modules that are explicitly mentioned in the exercise.Question 1: Reading in the dataThe rest of the exercise will be done in Python.In theexercise.pyfile, write a function calledread_speechesthat takes a speech filename asargument, and returns a dictionary, where the keys are the title of the speeches in the file (asstrings), and the values are the corresponding speeches (as strings). You should remove the "#"character from the titles. Replace\ncharacters in the speech lines with a single space. Also, allspeech lines starting with "[" should be omitted. Remember to close the files after reading fromthem.In theexercise_test.pyfile, call theread_speecheson both theclinton_speeches.txtand thetrump_speeches.txtfiles, and save the results in two variables calledclinton_speeches_dictandtrump_speeches_dict, respectively. Print out the size (number of keys) in each of these twodictionaries.If you have problems completing this exercise, and therefore do not have the requesteddictionaries, please follow the following instructions in order to be able to complete the remainderof the exercises:Download the following files:•trump_speeches.json(Links to an external site.)•clinton_speeches.json(Links to an external site.)Insert the following lines of code in your program:import jsonjson_file_trump = open("trump_speeches.json")json_file_clinton = open("clinton_speeches.json")trump_speeches_dict = json.load(json_file_trump)clinton_speeches_dict = json.load(json_file_clinton)json_file_trump.close()json_file_clinton.close()Note that these dictionaries are not exactly identical to the ones you get from solving the exerciseyourself (so you cannot use them for verification purposes).2. In theexercise.pyfile, write a function calledmerge_speechesthat takes a list of speechesexcluding titles (i.e. list of strings) as argument, and returns a single string containing all speeches.As a separator between the speech strings, use a single space.In theexercise_test.pyfile, call themerge_speechesto merge all of Clinton's speeches and all ofTrump's speeches, and save the result in variables calledclinton_speeches_alland thetrump_speeches_all, respectively (Hint: there is an easy way to get all of the values in a dictionaryas a list). Print the length of the two lists to screen.
If you have problems completing this exercise, and therefore do not have the requested stringscontaining all speeches, please follow the following instructions in order to be able to completethe remainder of the exercises:Download the following files:•trump_speeches_all.txt(Links to an external site.)•clinton_speeches_all.txt(Links to an external site.)Insert the following lines of code in your program:trump_file = open("trump_speeches_all.txt")clinton_file = open("clinton_speeches_all.txt")trump_speeches_all = trump_file.read()clinton_speeches_all = clinton_file.read()trump_file.close()clinton_file.close()Note that these strings are not exactly identical to the ones you get from solving the exerciseyourself (so you cannot use them for verification purposes).Question 2: CountingIn the.pyfile, write a function calledcount_wordsthat takes a text (string) as input, and returnsthe number of words in this text. You can assume that words are defined as anything that isseparated by whitespace.In theexercise_test.pyfile, test thecount_wordsfunction on thetrump_speeches_allvariable, and print the result.If you have problems completing this exercise, and therefore do not have the requested function,please follow the following instructions in order to be able to complete the remainder of theexercises:Insert the following lines of code in your program:def count_words(text):"""Dummy function that always returns 50000"""return 50000You can use this function as a replacement for the correct function in the questions below.2. In theexercise.pyfile, write a function calledcount_sentencesthat takes a text (string) as input,and returns the number of sentences in this text. You can assume that sentences are defined asbeing separated by any of the following characters: ".","!" or "?". Hint: an easy way to solve this isusing a regular expression, combined withre.split.In theexercise_test.pyfile, test thecount_sentencesfunction on thetrump_speeches_allvariable,and print the result.If you have problems completing this exercise, and therefore do not have the requested function,please follow the following instructions in order to be able to complete the remainder of theexercises: