Text Analysis of YouTube Channels

Verified

Added on  2020/02/19

|24
|3062
|222
AI Summary
This assignment delves into the analysis of text data from various YouTube channels. Students are tasked with using natural language processing techniques to extract key themes and topics from channel descriptions, video titles, and potentially even transcripts. The analysis aims to uncover trends and patterns within different channel categories such as technology, world news, entertainment, social media, and business. The final output includes visualizations and summaries of the identified top topics for each channel category and an overall overview of prevalent themes across all channels.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
Customer Analytics with Social Media
(Assignment 1 - Social Media Analysis for Understanding Customer
Preferences and Sentiments)
By
<Student Name>
(18833953)

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Table of Contents
1. Introduction 1
2. Case Study: 1 2
2.1 Properties of articles 2
Number of shares 2
2.2 Keyword analysis using SAS Text Miner 6
3. Case Study: 2 10
Appendix A
Document Page
List of Figures
Figure 1 Title length (Overall): Number of shares..........................................................................3
Figure 2 Content length (Overall) : Number of shares....................................................................4
Figure 3 Published on the weekend (Overall) : Number of shares..................................................4
Figure 4 Top 10 shares : Average title lengths................................................................................5
Figure 5 Top 10 shares : Average content lengths..........................................................................5
Figure 6 Top 10 shares : Published in weekdays:............................................................................6
Figure 7 Topics of whole data vs Share.........................................................................................10
Figure 9 qplot of sentiments..........................................................................................................11
Document Page
List of Tables
Table 1 “Lifestyle” channel : Top 10 topics 6
Table 2c“Entertainment” channel : Top 10 topics 7
Table 3 “Business” channel: Top 10 topics 7
Table 4 “Social Media” channel : Top 10 topics 8
Table 5 “Technology” channel : Top 10 topics 8
Table 6 “World” channel : Top 10 topics 9
Table 7 “Whole” data : Top 10 topics 9

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
1. Introduction
Text mining can be mentioned as a special type of practice and strategy which employs the rules of data
mining to text. It is an automated process which helps us to identify and disclose previously undiscovered
designs of text data. Sentiment analysis, assists us to obtain the attitudes, feelings, and views of
individuals and groups from textual data and contents. Sentiment analysis is usually applied to sentences
as well as short messages. Sentiment analysis can also be implemented to the entire passage to assess the
importance of a view or outlook1. Combination of text mining and sentiment analysis can provide
superior power to describe and predict any textual content of social media behavior2, significant amount
of descriptive and predictive power we are provided with a significant amount of descriptive and
predictive power3. In this study, our main purposes are to find the impact of articles depending upon
sharing and understand movie reviewers’ sentiment in social networking site and comparison of emotions
using different text mining methods. In case study 1, our aim is to discover the connection among
different keywords, number of shares, and different expressions, which would assist to formulate strategy
to improve advertise campaigning and improved communication with customers. In case study 2, our aim
is to understand sentiments of consumers’ review, which would assist to find insights to enhance the
campaign tracking.
1 Khan, F. H., Bashir, S., & Qamar, U. (2014). TOM: Twitter opinion mining framework using hybrid
classification scheme. Decision Support Systems, 57, 245-257.
2 Khan, F. H., Bashir, S., & Qamar, U. (2014). TOM: Twitter opinion mining framework using hybrid
classification scheme. Decision Support Systems, 57, 245-257.
3 Khan, F. H., Bashir, S., & Qamar, U. (2014). TOM: Twitter opinion mining framework using hybrid
classification scheme. Decision Support Systems, 57, 245-257.
1
Document Page
2. Case Study: 1
1.1 Properties of articles
There are 2100 observations for life style, 7059 for entertainment, 6259 for business, 2325 for social
media, 7345 for technology, 8425 for world, and 6147 datasets without association with any of these
channels.
Number of shares
Overall highest number of shares i.e. 843300, were for “Leaked: More Low-Cost iPhone Photos”, (this is
not falling under above mentioned six channels).
Highest number of shares under lifestyle channel have been observed for “Obama to Discuss NSA
Reform with Lawmakers” i.e. 208300.
Under entertainment channel, maximum number of shares have been observed for “Sprint's New Plans
Guarantee Unlimited Data for Life” i.e. 210300.
Under business channel, maximum number of shares have been observed for “Dove Experiment Aims to
Change the Way You See Yourself” i.e. 690400.
Under social media channel, maximum number of shares have been observed for “World's First Sprout-
Powered Battery Just Lit Up a Christmas Tree [VIDEO] i.e. 122800.
Under technology channel, maximum number of shares have been observed for “Startup stories from
early hires i.e. 663600.
Under world channel, maximum number of shares have been observed for “U.S. Will Now Monitor All
Travelers From Ebola Zone for 21 Days i.e. 284700.
2
Document Page
High number of shares characteristics
Overall, for highest 10 shares, 51.1 is the average title length is, 4307.4 is the average content length, and
90% of the top 10 shared articled have been published on the weekdays. For the highest shared article,
title length and content length are 35 and 2073 and it has been published on a weekday.
comparison of overall number of shares with title length, content length, and if published on the weekend
have been shown in Figure 1, 2 and 3. From the figure 1, it can be identified title length of highest shared
articles are usually between 40 and 70. . From the figure 2, it can be identified that content length of
highest shared articles is less than 6000, and figure 3 depicts that most of the (80%) shares are of those
articles which have been published duriung weekdays.
Figure 1 Title length (Overall): Number of shares
3

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Figure 2 Content length (Overall) : Number of shares
Figure 3 Published on the weekend (Overall) : Number of shares
Figure 4,5 and 6, shows the comparison of top 10 shares among each of the channels. For highest shared
article, title length and content length are 52.4 and 5463.8 respectively, and it has been published on a
weekday.
4
Document Page
Figure 4 Top 10 shares : Average title lengths
Figure 5 Top 10 shares : Average content lengths
5
Document Page
Figure 6 Top 10 shares : Published in weekdays:
1.2 Keyword analysis using SAS Text Miner
Keywords have been extracted from the title for all the channels using text parsing, shown in Appendix 7
to Appendix 12.
Using text topic, top 10 topics of each categories have been identified for text filter frequency
weighting has been set as Log and Term weight have been kept Default (Inverse document frequency has
not been used as it is only recommended for documents which are larger than a paragraph.) . Table 1,
Table 2, Table 3, Table 4, Table 5, and Table 6 contains top 10 topics under all the channels i.e. lifestyle,
entertainment, Business, Social media, technology and world. Few topics related to video, Facebook,
google, social media, and apps exist across all the channels.
Table 1 “Lifestyle” channel : Top 10 topics
Topic Number
of Terms
#
Docs
apps,kids,top,week,+miss 13 142
+video,+game,music,+vine,youtube 18 161
6

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
app,+help,facebook,+launch,phone 22 118
+day,valentine,gifts,tech,ways 20 93
social,media,+brand,digital,+event 20 51
iphone,+case,apps,best,apple 20 123
world,cup,+game,watch,digital 15 54
+know,news,apple,things,tech 27 131
+photo,instagram,
+show,snapchat,first
23 96
google,+glass,apple,maps,facebook 32 75
Table 2c“Entertainment” channel : Top 10 topics
Topic Number
of Terms
#
Docs
+video,music,+trailer,+man,viral 46 692
social,media,+day,+week,+job 39 215
+game,thrones,+play,
+video,season
37 286
google,+glass,+know,news,
+report
50 269
+world,+cup,+day,best,first 44 205
+star,wars,+movie,nba,first 64 273
+watch,super,bowl,+show,online 58 509
facebook,users,news,+know,app 67 318
apple,iphone,app,first,+launch 94 553
twitter,nba,accounts,nfl,+star 55 281
Table 3 “Business” channel: Top 10 topics
Topic Number
of Terms
#
Docs
+video,+game,viral,trailer,+show 42 508
+know,news,apple,+story,earnings 28 196
social,media,brands,marketing,
+event
36 199
+ad,super,bowl,+watch,youtube 31 289
facebook,mobile,users,app,+search 51 299
apple,iphone,app,+report,samsung 69 468
google,+glass,app,android,+photo 40 239
twitter,+chat,+live,
+founder,startup
48 252
world,cup,+time,first,+win 60 355
+job,+time,+live,+opening,york 60 396
Table 4 “Social Media” channel : Top 10 topics
7
Document Page
Topic Number
of Terms
#
Doc
s
+miss,digital,resources,media,apps 4 104
facebook,users,+home,+look,feed 13 231
twitter,+account,tweets,app,+follow 16 177
+video,+vine,+game,youtube,music 17 214
reddit,facts,
+week,learned,fascinating
7 66
social,media,+network,
+day,infographic
18 91
top,tech,+week,comments,pics 18 96
+photo,mashpics,challenge,mashabl
e,+moment
19 86
google,app,news,+know,+report 31 180
world,+know,news,cup,+day 19 157
Table 5 “Technology” channel : Top 10 topics
Topic Number
of Terms
#
Docs
iphone,5s,apple,+photo,+case 54 275
google,+glass,app,
+search,android
27 439
+top,+week,tech,hands-on,apps 19 248
+video,+game,+vine,
+watch,youtube
48 634
samsung,galaxy,s4,+smartphone,
note
43 249
+know,news,facebook,things,twit
ter
31 206
world,cup,first,+day,+time 45 387
apps,+miss,+want,android,hands-
on
19 275
facebook,app,+phone,twitter,
+launch
89 694
apple,ipad,ios,+launch,+watch 68 560
Table 6 “World” channel : Top 10 topics
8
Document Page
Topic Number
of Terms
#
Docs
twitter,+account,
+game,facebook,users
83 367
+video,+game,music,+watch,+ad 64 549
+world,cup,+day,first,brazil 49 322
google,+glass,+game,+street,
+view
52 295
apple,iphone,+report,app,ios 83 621
social,media,
+event,marketing,digital
42 186
+photo,+show,+game,first,+star 94 612
+know,news,things,apple,obama 36 207
facebook,app,+ad,first,+time 66 548
u.s.,+city,obama,+time,space 130 909
Top 10 topics have been identified in whole data, using Text topic method, and have been shown below
in Table 7.
Table 7 “Whole” data : Top 10 topics
Topic Number
of
Terms
# Docs
google,glass,app,android,doodle 91 1414
+video,+game,+watch,watch,thrones 168 2700
facebook,twitter,app,users,ads 137 1365
+know,news,apple,apple,feed 72 768
iphone,apple,app,ios,ipad 188 2500
media,social,digital,resources,+day 75 874
world,cup,twitter,+day,+world 175 2721
apps,week,+want,+top,tech 95 1614
video,recap,viral,+week,music 128 1410
samsung,galaxy,+tv,s4,s5 146 1054
9

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Figure 7 Topics of whole data vs Share
Shares across top 10 topics of complete datasets have been depicted below in figure 7. Topic 5 (iphone,
apple, app, ios, ipad) has maximum number of worth of shares i.e. 6884371. Topic 7 (world,cup,twitter,
+day,+world) has minimumt number of worth shares i.e. 84487).
3. Case Study: 2
All the required packages have been called and all the libraries have been activated, using
“sentiment_test.csv”, a R datafile named “sentimentdata” has been created. For dictionary based
sentiment analytics, get_sentiment, ‘syuzhet’ method and sign function have been used, the sign
function is used to change all the positive numbers to plus one, all negative numbers to minus
one and all zeros are kept as zero. 1, -1 and 0 have been defined as “pos”, “neg”, and “neu
respectively. For syuzhet method, there are total180 sentiments, which differ from the original
fact data.
10
Document Page
Emotion categories have been projected using syuzhet function ‘get_sentiment’ and then all
other methods have been used like bing’, ‘afinn’ and ‘nrc’. For bing method, there are 174
sentiments, which differ from original fact data. For afinn method, there are 162 sentiments,
which differ from original fact data. For nrc method, there are 243 sentiments, which differ from
original fact data.
Figure 8 qplot of sentiments
Eight emotions have been identified using “Get_nrc_sentiment”, which are positive, anticipation,
negative, trust, joy, anger, surprise, fear, disgust, and sadness and count of these emotions are 327, 219,
199, 183, 169, 105, 94, 89, 87 and 86. qplot has been plotted using ‘ggplot2’, which has been depicted
in above (Figure 9).
11
Document Page
Appendix
Appendix 1 Project Creation “18833953Assignment1_Task1” & 2 diagrams “Task1” and
“Task1_file_creation”
Appendix 2 “Task1_file_creation” diagram
A

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Appendix 3 datasets import
Appendix 4 Definitions of roles
B
Document Page
Appendix 5 Text Parsing: 6 channels
Appendix 6 Parts of speech : ignore
C
Document Page
Appendix 7 Text Parsing output of a channel
Appendix 8 Channel technology : Text Filter output
D

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Appendix 9 Channel world : Text Filter output
Appendix 10 Channel lifestyle: Text Filter output
E
Document Page
Appendix 11 Channel entertainment : Text Topic output
Appendix 12 Channel business :Text Topic output
F
Document Page
Appendix 13 Channel social media : Top 10 Topics
Appendix 14 Channel technology : Top 10 Topics
G

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Appendix 15 Channel world : Top 10 Topics
Appendix 16 Overall : Top 10 Topics
H
Document Page
Appendix 17 Installation of packages and data creation
Appendix 18 Creation of Library and other method validation
I
1 out of 24
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]