Ask a question from expert

Ask now

Impact of Reviews On Consumers | Amazon

4 Pages2367 Words110 Views
   

Added on  2022-07-11

About This Document

In this document, we talk about reviews, reviews are available online. these reviews play an important and critical role in purchasing and decision-making of consumers. Consumers want to find useful information as rapidly as possible throughout their investigation. In this sheet, we take the example of amazon how they provide genuine reviews to users and optimize them using various methods.

Impact of Reviews On Consumers | Amazon

   Added on 2022-07-11

BookmarkShareRelated Documents
AMAZON REVIEWS
BAVANA
Applied Machine Learning
School of computing and mathematical sciences
Abstract: Many reviews are now available online. In addition to providing a vital source of knowledge, this user-
generated informational content, significantly impact client buying decisions. These reviews played a moderate,
significant, or critical role in their purchasing decision. Consumers have consequently become confused about relying
on online reviews. Consumers want to find useful information as rapidly as possible throughout their investigation.
Users, on the other hand, may find searching and comparing text reviews to be tedious that flooded with information.
In fact, the enormous volume of the unstructured text nature of text reviews prevents the user from selecting a
product without difficulty rather than the written content, the star rating, which ranges from 1 to 5 on Amazon,
provides a fast overview. This task is about text categorization using machine learning models to classify the review
material into a specific category. Multinomial Bayes classifiers and support vector machine classifiers are two
machine learning models that have been used. Before training the machine learning models with the peer review
data, pre-processing techniques are used. Both models perform better when it comes to classifying review data, and
this test also highlights the factors that influence text classification performance.
Keywords: Data cleaning, data processing, Feature extraction, Model training
I. Introduction
Classification, as previously stated, is a supervised machine learning activity that seeks to create a
model that can generate predictions. This model is built using annotated historical data.
Text categorization, often known as text classification, is a subset of classification. The issue with organized data
[1] claims that given a set of papers, as well as a predetermined set of classes C, the goal is to use a classification
algorithm to learn a classifier. and for each document d, forecast the best probable class c instead of individually
assigning a class to each document, which might take a long time, text categorization allows you to choose from a
list of established categories. Based on human-labeled training papers, a text document belongs. Text clarification
is the practice of breaking down a text into a group of words. Natural language processing (NLP) is used in text
categorization to analyze text and assign a set of predetermined tags or categories based on its context. NLP is
used for sentiment analysis, topic category recognition, and language translation. Text classifiers are divided into
three categories. The three primary techniques for text classification are rule-based systems, automated systems
employing machine learning methods, and hybrid systems.
II. Ethical discussion
The dataset includes both the review text and the review score and the product category with ratings. The
dataset contains no information about the authors of the papers. As a result, they have no concerns about their
privacy. This research used the decision tree and random forest machine learning models. Machine learning is
used to categorize the text, and there is a chance that the text classification will be inaccurate due to a misreading
of the words in the review text. It's also feasible that the dataset used is noisy and untrustworthy. As a result, the
dataset, as well as the machine learning algorithms, will be skewed. Now that the data has been obtained, it's time
to put it to use. In this section, we'll go over the steps necessary to forecast sentiments based on reviews of various
films. Any text categorization task can benefit from these procedures. To train text classification classifiers, we’ll
utilize Python's Scikit-Learn machine learning framework.
III. Dataset Preparation
The pandas read CSV function is used to read the dataset for this task. The many properties of the dataset are
detected when utilizing various EDA methods to explore it. The size of the dataset is determined by the number of
rows and columns in the dataset. The dataset contains 31887 rows and 5 columns. The info method can then be
Impact of Reviews On Consumers | Amazon_1
used to inspect the data type of each column. The isnull method can be used to find null values in a dataset, and
the given dataset has them. The unique values in each column can be seen using the unique method, and the
confidence score and acceptance status have NaN values. The dropna() method could be used to eliminate null
and NaN values. As null values are eliminated from the rows. Because some index values have been removed, the
reset index methods can be used on the dataset to mark the index. The value count’s function displays the number
of values under each unique label.
The counterplot of the sklearn module is used to plot the distribution of data by category for acceptable
status. The next step is to pre-process the text after the dataset has been imported. Numbers, special characters,
and undesired spaces are all possible in text. We may or may not need to remove certain special characters and
numbers from text, depending on the problem. We shall, however, delete all special characters, numerals, and
unwanted spaces from our text for the sake of clarity. Stop words are removed. These are ordinary words like an
ability, either, else, ever, and so on that offer nothing to the classification. As a result, for our purposes, the
election was finished, and a very close game was a very close game. Words that lemmatize This is where several
inflections of the same word are grouped together. As a result, election, elections, elected, and other similar terms
would be clustered together and counted as more instances of the same word. n-grams are used. We may count
sequences of terms instead of single words, such as "clean match" and "close election," as we did here. TF-IDF is
used. Instead of just calculating frequency, we may go a step further and penalize terms that appear often in the
majority of the sentences.
IV. Methods
Logistic regression and multinomial nave Bayes algorithm are the two machine learning models used
in this work. Naive Bayes Multinomial Algorithm. This technique is particularly well suited to categorizing issues
with discrete features. Text classification, for example, can benefit from this because text parameters like word
count are distinct [6]. This technique works with both integer and fractional count characteristics. Count
vectorization and TFIDF are two feature vectors that work with this technique to enable efficient classification
[7]. This algorithm is the most accurate for the data.
1. Multinomial naïve bayes:
By no means is the Gaussian assumption the sole straightforward assumption that might be used to determine
the generating distribution for each label. Multinomial
naive Bayes is another helpful example, in which the
features are believed to be generated by a simple
multinomial distribution. Multinomial naive Bayes is best
for features that represent counts or count rates since the
multinomial distribution describes the chance of detecting
counts across multiple categories. The concept is the same
as previously, however, instead of modeling the data
distribution with the best-fit Gaussian, we model it with
the best-fit multinomial distribution.
2. Logistic regression
The second classification approach is multinomial logistic regression or MaxEnt for short. A family of
classifiers known as exponential classifiers or logarithmic linear classifiers includes logistic regression. A log-
linear classifier works like a naive Bayes by extracting a set of weighted features from an input, creating a log,
and joining them linearly (that is, each value is repeated with a weight). Will be added from). Although we
sometimes use shorthand logistic regression even when we are talking about multiple classes, logistic regression
Impact of Reviews On Consumers | Amazon_2

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Text Classification Using Naïve Bayes
|15
|1230
|324

(PDF) SVM Classification with Linear and RBF kernels
|5
|1826
|79

Data Classification of Machine Learning Assignment
|10
|1341
|18

Machine Learning and Artificial Intelligence | Data Classification
|10
|1545
|17

Machine Learning Analysis of Wisconsin and Mnist Datasets
|16
|2678
|23

Handwritten Digit Classification and Sample Complexity using Kernel Perceptron
|8
|1975
|279