Data Analysis of YouTube Dataset using IBM Watson: ITECH1103 Project

Verified

Added on  2025/04/08

|18
|2437
|133
AI Summary
Desklib provides past papers and solved assignments for students. This project analyzes YouTube data using IBM Watson.
Document Page
ITECH1103
BIG DATA AND
ANALYTICS
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Contents
Task 1 – Background Information...................................................................................................3
Task 2 Reporting / Dashboards........................................................................................................4
Task 3 – Advanced Insights...........................................................................................................10
Task 4 - Research...........................................................................................................................12
Task 5 – Recommendations:..........................................................................................................13
Task 6: Cover letter:......................................................................................................................14
Task 7 – Team Reflection..............................................................................................................16
References:....................................................................................................................................17
Document Page
Task 1 – Background Information
The data analysis is done so that different details and knowledge can be explored. The IBM
Watson tool is used for analysis of dataset. Today’s world is generating lot of data which can be
effectively used to improve our business and organization. (Journal of big data, 2014). Through
analysis, interesting information or latest trends can be researched and acquired. It can learn and
analyze datasets and through that various interesting insights can be gained. (Tsumoto, 2013). It
is not up to exploring and analyzing but through this visualization of different information and
insights can be possible and can be adhered to understand data effectively and efficiently. The
analysis is done for Content analyst, ABC online multimedia company. The company wants to
gain insights into data in YouTube so that they can make their policies accordingly to go in long
run.
The dataset on which analysis is procured is YouTube-dataset. This data set is downloaded from
data world. The data set includes all information about YouTube videos from year 2006 to 2018.
Different types of details of Videos are adhered such as Music, Education, Gaming, etc. The
parameters which are included in dataset are as Video_id, Tags, week day when video is
published, The date on which the video is in trending, Name of channel where video is
published, The time period when video is published, Country where video is published, No. of
Views and likes, etc. The insights from the data are provided in form of answers for all
questions. To understand and gain all information effectively graphs and charts are showed.
Through IBM Watson Analysis tool different tasks can be performed as Explore and manipulate
data, Analyze data descriptively by sing NLP, Prediction of various numerical and categorical
basis, Visualizing data and gain knowledge and wonderful insights from them. (Eysenbach et
al, 2016).
Document Page
Task 2 Reporting / Dashboards
The results of data analytics can be put under dashboards provided by IBM Watson Analysis
tool. It not only includes this but different Visualization done like graphs and charts can be
stored under dashboard. Various Answers of Questions asked are included in one dashboard. The
Visualization is done through charts and graphs which are included in dashboards.
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
By exploring and analyzing dataset, we found some interesting insights from them which are as
follows:
1. Count of all videos which are uploaded as per dataset: The number of videos which are
uploaded in dataset is as 55885. It is done by computation of a parameter as Video_id.
Through this, we can get the count of all videos which are uploaded.
2. To find various types of categories or parameters which are uploaded: This was
computed with help of Category_id which comes out to be 18.
3. Count of all countries in data set provided: The count of all enunciated countries in
YouTube dataset provided are as 4. This was computed with help of Publish_country.
Document Page
4. Find out the Number of distinct channels in YouTube data set: The Distinct channels are
computed with help of Channel_Title. The count operation was carried out to find this
solution.
5. Determine 3 Highest countries in terms of count of Channels in YouTube Data set:
Visualize tool is used to find Highest countries in terms of channels. Visualization is selected
from Data Refinery tab. From that go to Charts and select chart which is appropriate to gain
effective insights and knowledge. The 3 highest countries in terms of number of channels
were as:
1. France
2. Canada
3. US
France is the highest one in terms of Number of channels. The lowest one in terms of number of
channels is US. (From all these 3 enlisted). According to graph, GB is the lowest one in terms of
Number of channels.
6. Find Lowest Country in terms of number of channels: According to graph, GB is the
country which has the lowest number of channels. This can also be computed with help of
Channel_Title.
7. Calculate No. of all distinct channels in US: The filtration was procured on the basis of
Publish_country for US. The count of channels comes out to be 2207.
8. Find List of top 10 Videos on basis of various countries: The list of all top 10 videos
enlisted in different countries are as follows:
Document Page
9. Find List of least 10 Videos on basis of various countries: The list of all least 10 videos
enlisted in different countries are as follows:
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
10. Find Number of years of all uploaded videos in data set: The count of enunciated date on
which it is published in data set comes out to be 13. This was found with help of
Publish_date summary.
11. Calculate all last month uploaded videos: The filtration was done on basis of Publish_ date
by month. The last month uploaded videos comes out to be 8397.
12. Find out the particular year when most of uploaded videos were there in country GB:
The particular year comes out to be 2018 when multiple numbers of uploaded videos were
there in GB.
Document Page
13. Find the particular hour when most of the videos were uploaded according to YouTube
Data set and also find differences in hours according to different countries: Most videos
were uploaded in 16:00 – 16:59. The frame of time according to different countries came out
to be 17:00 – 17:59.
14. Top 3 categories which are viewed on basis of count of all videos which are uploaded:
The top 3 categories which are viewed are as follows :
1. Music
2. Entertainment
3. People and Blogs.
15. Least 3 categories which are viewed on basis of count of all videos which are uploaded:
The least 3 categories which are viewed are as follows:
1. News and politics
2. Sports
3. Comedy
16. Gather a video in dataset which has highest number of likes: The video which has highest
likes comes out to be Romeo Sa RomeoSantosVE Video.
17. Gather a video in dataset which has low number of likes: The video which has low likes
comes out to be Coachella Coachella video.
18. Gather day of week which has greatest count of videos: Thursday & Friday are the days
when greatest count of videos is uploaded.
19. Gather day of week which has lowest count of videos: Monday was the day when lowest
count of videos was uploaded.
20. Find monthly breakdown of all videos that are published: The monthly breakdown comes
out to be as lowest one is Monday and Highest count of uploads comes out to be Friday &
Thursday.
Document Page
Task 3 – Advanced Insights
The 5 insights of data exploration and data analysis are as follows:
Figure 1: First observation
Here the views are compared according to different categories. The Visualization above shows
as:
The Highest category on basis of views is 10.
After this comes 24.
The least one is 30 and 44.
Figure 2: Second observation
The heat map is made to observe this analysis. The big lock of all video ids shown is having
highest likes.
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Figure 3: Third observation
The summary of likes comes out to be 10602434110 likes.
Figure 4: Forth observation
The summary of dislikes comes out to be 3490.15.
Figure 5: Fifth observation
GB has highest rates of view than France.
Document Page
Task 4 - Research
The 20 Question prescribed were researched. Also, new 5 insights were taken into consideration
so that data can be researched properly to gain interesting insights.
Figure 6: New 5 insights on which research is done
In first insight, the views were compared on basis of category id. This was done to gain the
number of views according to different categories. This insight can help to gain views on
different categories. The categories which are no focused as per views should be focused and the
views should be gained more and more to be in competitive market.
In second insight, the breakdown of likes on basis of different video ids is calculated using heat
map. The more the area gained in head map by video id, more would be their views. The views
can be improved and gained.
In third and fourth insights, the summary of likes and dislikes are taken into consideration. This
is done in order to compare both of them.
In fifth insight, 2 countries are been taken into consideration where views are compared. The GB
has highest rate of views than France.
chevron_up_icon
1 out of 18
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]