logo

Application of Machine Learning Assignment 2022

To apply skills and knowledge acquired throughout the semester in classification algorithms and machine learning process.

13 Pages2705 Words17 Views
   

Added on  2022-10-06

Application of Machine Learning Assignment 2022

To apply skills and knowledge acquired throughout the semester in classification algorithms and machine learning process.

   Added on 2022-10-06

ShareRelated Documents
Application of machine learning in R
Executive summary
With the rapid growth in social media platform, the internet has been recognized as one of the
platform for online studies, sharing and exchange of ideas as well as opinions. Social media
comprises of a lot of data inform of tweets, posts, blogs and status updates etc. In this report, the
most famous social media platform used for blogging called Twitter is discussed. Twitter
comprises of huge quantity sentiments data (tweets) that should be analyzed so as to extract
user’s ideas and suggestions. The main goal of this report is to explore how machine learning
technique can be used to expand data in a sequence of posts putting focus on several trends of
languages used for tweets as well as the tweets volumes. The use of experimental evaluation
indicates that the suggested machine classifiers are better and efficient in terms of specificity,
sensitivity and accuracy. The suggested algorithm is implemented in R-programming software or
R-studio.
Key words –Machine learning, sentiments, Twitter, R-studio.
Introduction
Social media platforms are among the few sources which contain important types of information.
This is because each and every person post their perceptions on several agendas, current issues
are discussed, people complains and shares positive statements for various products used in the
daily life. Data mining or sentimental analysis involves the process of extracting quality
information from a given text (Cordon et al., 2018). Moreover, it’s the process of extracting well
and organized data from the one that is unstructured or not organized, which aid in measuring the
customer’s opinion, product reviews and reports. Unorganized data not only refers to figures
from the company, tables but also comprises of information drawn from the internet such as E-
Application of Machine Learning Assignment 2022_1
mails, chats, social media sites, pdfs and word file. The operations can be easily analyzed and
better results obtained using the structured data. However, in the case of unstructured data such
as twitter, E-mail etc it’s not quite easy obtaining output this is due to challenges such as
ununique data and the virtual noise effect (Kumari,Vidya $Karitha,2019). In this report, we focus
on one of the social media platform known all over the world called Twitter. The report consists
of different sections. The first section contains the literature review which involves studying
other authors books, journals based on machine learning. The second section addresses different
machine learning techniques. The third section clearly explains the development of the different
algorithms, for instance in this report we have discussed five different algorithms. The last
section involves the performance of the model and conclusions. All the models will be generated
using both the test and train data sets one and two (Arora et al., 2018). The five models to be
generated include decision tree, random forest, naïve Bayes and logistic regression which are all
supervised types of machine learning and k-means clustering which is an example of
unsupervised types of machine learning.
Literature review
Many researchers have published researches based on the sentiments analysis techniques from
time to time. The data mining techniques improves the results of classifications, features
selection and different data pre-processing steps etc. The research focusses on both supervised
and unsurpevised approach for data minng task. Various researches have define multiple faces of
data mining as opinion oriented and feature extracted (Bowers, Alex & Xiaoliang Zhou, 2019).
Machine learning classifiers such as supervised types need different characteristics for learning
from time to time for better comparison of the output. Different data sets are collected then pre-
processed before conducting the supervised types of machine learning (Bowers, Alex and
Application of Machine Learning Assignment 2022_2
Xiaoliang Zhou, 2019). Various classifiers and approaches such as naïve Bayes have been used
from time to time for evaluating the output as preecision, f-measure, recall and accuracy.
Supervised machine learning algorithms and it’s importance
Mjarity of machine learning techniques uses the supervised machine learning methods. In this
case the input and output variables are used to learn an algorithm. The main goal in supervised
machine learning is to come up with a mapping function that can be used to predict the output
variables. The name supervised machine learning is used beacause the process learns from the
the training data set. The algorithm continously make prediction on the training data until the
correct prediction are made. This process stops only after achieving the correct level of
perfomance. There are two methods of supervised machine learning that is classification and
regression. In this report we shall discuss four types of supervised machine learning algorithms
such as decision tree, naïve Bayes, logistic regression and random forest. The decision tree are
trees which classifies through sorting based on characteristics values (Benvenuto et al., 2018). A
In a decision tree, each node represents a characteristics to be classified and the branch
represents the value which can be assumed by the node. Therefore, the decision tree predicts the
model by mapping the observation with the target value. The next type of machine learning is the
Naïve Bayes. It comprises of simple bayesian networks of cyclic graphs with only one
unobserved node (Naghibi et al., 2016). The Naïve Bayes is based on estimation making it
inaccurate as compared to other types of supervised machine learning models. The next type of
machine learing is the logistic regression. In this binomial outcome is used to determine the
classification. Furthermore, the association between the dependent variable and the independent
variable are identified. Logistic regression is considered as one of classifications which are
Application of Machine Learning Assignment 2022_3
accurate. This is brought by the use of probabilities to make prediction (Arora et al., 2018). It’s
perfomance and evaluation can be tested by the use of confusion matrix, F-measure and AUC.
The last type of supervised machine learning type that will be discussed in this report is the
random forest. This method uses several artificial tree for classification. This can be computed
easily in R-studio using the different in built libraries or packages. It’s perfomance and
evaluation are tested by use of confusion matrix.
Unsupervised machine learning algorithms and it’s importance
In this case we only have the input but no output variables. The main gaol of this method is to
model the structured data inorder to know more from the data (Karatay et al., 2016).
Moreover,the algorithm discovers and presents the intersting data by their own. The
unsupervised machine learning algorithm comprises of clustering and association. In this report
we shall only discuss the k-means or clustering. It classifies data based on the number of clusters
( k clusters) and can be used when the labeled data is inavailable.
Creations of classification models
The data sets provided were in the form of text and this could be opened and viewed in notepad.
First the data was converted from text to a csv file in excel for easy analysis. This because in
machine learning most models can be easily generated using numeric data and not text data. The
text data is mainly used in text and natural language analysis. After conversion into a csv file the
first roe in the excel was used to insert the variables names as given by the JSON format in the
requirement file. Thereafter, the csv file was imported into R-studio then viewed. The data set
consist of defferent variables and it’s in tabulart form. After the data importation, we dig deep
into the models mentioned above and start by the creation of logistic regression model. Before
Application of Machine Learning Assignment 2022_4

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
POS Tagging Algorithm for Location Mining from Tweets
|3
|1584
|108

Application of Machine Learning in Twitter
|14
|2994
|76

Data Mining on Twitter Data using Machine Learning Algorithms
|12
|2724
|292

Twitter Spam Detection Research 2022
|8
|2310
|20

Data Analytics for Cybersecurity
|28
|2909
|246

Abuse Detection on social media platform twitter using Deep Learning Technique Article 2022
|9
|1650
|18