roject [100 points] Due on 11.20 at 11:55pm.

Added on - 16 Sep 2019

  • Dissertation

    type

  • 2

    pages

  • 494

    words

  • 79

    views

  • 0

    downloads

Showing pages 1 to 1 of 2 pages
roject [100 points]Due on 11.20 at 11:55pmInstructor's Comments:The project is designed to develop stduents' capabilities apply theprogramming skills to dooinformation extractionosentiment analysisNow, let us break the problem into 10 parts.Task 1 [5 points]: Download and readmoviereview.txtinto yourcomputer.Task 2 [5 points]: Extract all words in moviereview.txt.Task 3 [5 points]: Lowercase all words.Task 4 [15 points]: Calculate word frequence (i.e., how many timeseach word apprears in moviewreview.txt)Task 5 [15 points]: Based on the word frequence you calculate inTask4, sort the words in a descending order and display the top 5words and their frequence in the console by following the formatbelow.The output below is not the answer.Top 5 words in moviereview.txt, organized in a descending order:and appears 100 timesfilm appears 40 timesthat appears 34 timesis appear 33 timesare appears 30 timesTask 6 [5 points]: Download and readpositive.txtinto your computer.Task 7 [5 points]: Extract all words in positive.txt.Task 8 [5 points]: Lowercase all words.Task 9 [15 points]: Calculate word frequence (i.e., how many timeseach word in positive.txt appears in moviereview.txt)Task 10 [15 points]: Based on the word frequence you calculate inTask 9, sort the words in a descending order and display the wordswith frequence greater than 5 in the console by following the formatbelow.The output below is not the answer.Words in positive.txt that appear greater than 5 times in moviereview.txt,organized in a descending order:satisfy appears 10 timesmagical appears 9 timesgreat appears 7 times
desklib-logo
You’re reading a preview
card-image

To View Complete Document

Become a Desklib Library Member.
Subscribe to our plans

Unlock This Document