Data Mining Project: Sentiment Analysis of Amazon, Yelp, and IMDB

Verified

Added on  2020/05/28

|17
|1824
|56
Project
AI Summary
This data mining project presents an executive summary and detailed analysis of sentiment analysis applied to customer reviews from Amazon, Yelp, and IMDB. The project utilizes unsupervised machine learning techniques, specifically focusing on identifying the polarity of text data extracted from product and service reviews. The methodology involves loading data into R, preprocessing it to remove irrelevant information, and then applying sentiment analysis algorithms to classify sentences based on their emotional content (joy, anger, etc.) and polarity (positive, negative, neutral). The analysis includes the use of ggplot2 for visualizing the distribution of emotions and polarities, and word clouds to illustrate the frequency of words associated with different sentiments. The findings reveal that approximately 50% of the reviews for each company are positive, with the remaining reviews distributed between neutral and negative sentiments. The project concludes by emphasizing the significance of sentiment analysis for businesses in understanding customer feedback and gauging the reception of their products and services. The project includes references to relevant literature and the R code used for the analysis.
Document Page
Data Mining
Student Name:
University
14th January 2018
1
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
EXECUTIVE SUMMARY
Sentiment analysis is a type of unsupervised machine learning where sentences and words are
analyzed using existing algorithms like bayes to identify the polarity of the texts. The main
importance of this analysis is to try and identify what people think or write about amazon, yelp
and imdb using machine learning from their products and services reviews. From the results we
found out that about 50% of the reviews for each company was positive and the remaining
percentages were equally distributed between neutral and negative. This findings are important
because they help companies know how a newly introduced product or service is being received
by customers or clients.
INTRODUCTION
A crucial part of our data gathering procedure has dependably been to discover what other
individuals think. A number of challenges have emerged where individuals can, and do,
effectively utilize data advancements to search out and comprehend the conclusions of others
(Murdoch and Detsky, 2013). The abrupt rise of data mining and analysis of sentiment, which to
some great extent focusses on the computational assessment, notion, as well as subjectivity in the
text content, has subsequently happened at any rate to some extent as an immediate reaction to
the surge of enthusiasm for new frameworks that deal straightforwardly with feelings and
opinions as a first-class object (Boyd and Crawford, 2012). Lee and Pang ( 2008) on opinion
mining and sentiment analysis book further showed that Opinion Mining and Sentiment Analysis
is the primary such far reaching overview of this lively and vital research region and will bear
some significance with anybody with an enthusiasm for opinion-based information gathering
systems (Dhar, 2013).
2
Document Page
Sentiment basically identifies with feelings; states of mind and opinions. Analysis of
sentiments alludes to the act of applying Natural Language Processing and Text Analysis
procedures to recognize and remove subjective information from a given text. An individual's
mood are generally subjective and not based on facts. This implies that to precisely examine a
person's opinion or disposition from some text can be to a great degree challenging. With
Sentiment Analysis from a content investigation perspective, we are basically hoping to get a
comprehension of the demeanor of an author concerning a theme in some text and its
extremity; regardless of whether it's certain, negative or unbiased.
In examination of sentiments we utilize lexicon-based approach in identifying the mind-set of
the words utilized as a part of the writings or an audit. Taboada.et.,(2011) presented lexicon-
based as a way to deal with removing sentiment from texts. The Semantic Orientation
CALculator (SO-CAL) utilizes lexicons of words commented on with their semantic
introduction (extremity and quality), and involves negation as well as intensification. Sentiment
analysis are important for corporate and companies in order to know what their customers think
about them or what they write about the in either social media or their product reviews (Pang &
Lee, 2008).
HYPOTHESIS
Amazon, yelp and imdb products have positive reviews from their customers.
DATA
Our data was made up of three text files each with 1000 sentences making 3000. Text were
made up of positive and negative sentences (Kotzias, 2015). The source of the data set was
from three websites imdb.com, amazon.com and yelp.com and the data is about review of their
products which are movies, e-commerce products and restaurants respectively.
3
Document Page
METHODOLOGY
Data was loaded to R 3.3.2 using RStudio Version 0.99.441 using the function readLines. The
dataset was then manipulated for the analysis by removing punctuation marks, digits, addresses
and url links the main purpose of this is to remove words and sentences which will not be used
in the analysis. This was done using gsub function provided by R. Our data frame texts were
then all converted to lowercase.
ANALYSIS
Sentiment analysis of Amazon e-commerce review
After classifying the dataset in to its polarity using simple voter algorithm we obtained the
following results in our sentences.
emotion polarity
1 joy positive
2 joy positive
3 anger positive
4 joy positive
5 anger positive
6 anger positive
Using ggplot2 to plot a graph of emotions on the x axis and number of sentences on the y
axis, we obtain:
4
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Using ggplot2 to plot a graph of polarity on the x axis and number of sentences on the y axis, we
obtain:
5
Document Page
A Word cloud below shows the distribution of the words in terms of emotions, the biggest
shows the most used, up to the smallest which is the least used words.
6
Document Page
Sentiment analysis of Yelp restaurant review
Below is emotional and polarity classification of the yelp reviews.
emotion polarity
1 anger positive
2 anger negative
3 anger positive
4 anger positive
5 joy negative
6 joy positive
Using ggplot2 to plot a graph of emotions on the x axis and number of sentences on the y
axis, we obtain:
7
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Using ggplot2 to plot a graph of polarity on the x axis and number of sentences on the y axis, we
obtain:
8
Document Page
Word cloud
Sentiment analysis of Imdb movie review
Below is emotional and polarity classification of the imdb reviews.
emotion polarity
1 joy positive
2 joy positive
9
Document Page
3 anger neutral
4 anger negative
5 anger neutral
6 anger negative
Using ggplot2 to plot a graph of emotions on the x axis and number of sentences on the y
axis, we obtain:
10
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Using ggplot2 to plot a graph of polarity on the x axis and number of sentences on the y axis, we obtain:
Word cloud
11
Document Page
CONCLUSION
Amazon, Yelp and Imdb sentences were found to contain both positive and negative sentiments.
Using the sentiment analysis we obtain Amazon products review, Yelp restaurant review and
Imdb movie review to contain each about 500 positive sentences and the rest 500 were
distributed equally between negative and neutral review on those products. This means that
about 50% of their customers like the products and services they buy from this companies and
only about 25% are not happy with the products they obtain from this companies.
12
chevron_up_icon
1 out of 17
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]