Data Analysis of YouTube Videos: An Analytic Report (ITECH1103)
VerifiedAdded on 2025/04/08
|15
|2127
|321
AI Summary
Desklib provides past papers and solved assignments for students. This project analyzes YouTube video data.

ITECH1103- Big Data and Analytics
Group Assignment
ANALYTIC REPORT & PRESENTATION
1
Group Assignment
ANALYTIC REPORT & PRESENTATION
1
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Contents
Task 1.............................................................................................................................. 3
Task 2.............................................................................................................................. 5
Task 3.............................................................................................................................. 9
Task 4............................................................................................................................ 11
Task 5............................................................................................................................ 13
Task 6............................................................................................................................ 14
Task 7............................................................................................................................ 15
References.....................................................................................................................16
Figure 1: Answer 1 to 5....................................................................................................5
Figure 2: Answer 6-9........................................................................................................6
Figure 3: Answer 10-13....................................................................................................7
Figure 4: Answer 14 to 17................................................................................................7
Figure 5: Answer 18 to 20................................................................................................8
Figure 6: Advanced Insight.............................................................................................. 9
2
Task 1.............................................................................................................................. 3
Task 2.............................................................................................................................. 5
Task 3.............................................................................................................................. 9
Task 4............................................................................................................................ 11
Task 5............................................................................................................................ 13
Task 6............................................................................................................................ 14
Task 7............................................................................................................................ 15
References.....................................................................................................................16
Figure 1: Answer 1 to 5....................................................................................................5
Figure 2: Answer 6-9........................................................................................................6
Figure 3: Answer 10-13....................................................................................................7
Figure 4: Answer 14 to 17................................................................................................7
Figure 5: Answer 18 to 20................................................................................................8
Figure 6: Advanced Insight.............................................................................................. 9
2

Task 1
The online multimedia dealing company ABC, works in the field of multimedia and video
uploads. The company has assigned the task of analyzing the data to the Content
Analyst and wants the Content Analyst to completely analyze the data and produce
some points of suggestions for the Content Manager of the company. For this purpose,
the set of data that was selected is the information of the uploaded videos from the year
2006-2018 (Bhattacharya, et. al., 2016).
The data in the set of data was originally obtained from the Kaggle.com and the relevant
changes were made to make it more précised for the Content Analyst to analyze and
handle. This dataset is a combination of information about the videos that were
uploaded from 2006-2018, from the four different countries: US, Canada, France, and
GB. These videos uploaded were uploaded in many categories and received different
responses from the viewers. The response of the viewers was recorded in the form of
the comments received, likes, disliked and the views observed in the video. The
information related to videos were divided in the form of the publishing country, publish
date (further divided in the day, month, year), a channel on which it was uploaded, video
id and title. All this information will be used and processed in order to produce the
results of the analysis (Biju, & Mathew, 2017).
The categories in which the complete data was uploaded is:
3
The online multimedia dealing company ABC, works in the field of multimedia and video
uploads. The company has assigned the task of analyzing the data to the Content
Analyst and wants the Content Analyst to completely analyze the data and produce
some points of suggestions for the Content Manager of the company. For this purpose,
the set of data that was selected is the information of the uploaded videos from the year
2006-2018 (Bhattacharya, et. al., 2016).
The data in the set of data was originally obtained from the Kaggle.com and the relevant
changes were made to make it more précised for the Content Analyst to analyze and
handle. This dataset is a combination of information about the videos that were
uploaded from 2006-2018, from the four different countries: US, Canada, France, and
GB. These videos uploaded were uploaded in many categories and received different
responses from the viewers. The response of the viewers was recorded in the form of
the comments received, likes, disliked and the views observed in the video. The
information related to videos were divided in the form of the publishing country, publish
date (further divided in the day, month, year), a channel on which it was uploaded, video
id and title. All this information will be used and processed in order to produce the
results of the analysis (Biju, & Mathew, 2017).
The categories in which the complete data was uploaded is:
3
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

Task 2
As mentioned in the assigned task the data of the videos were analyzed and the
answers of the 20 questions were obtained:
Answer 1. From the analysis of the video_id, the total number of uploaded videos in the
set of data is 55885.
Answer 2. The set of data was processed and the obtained data reflects that there are
18 different categories of the uploaded videos.
Answer 3. The publish_country field in the set of data was processed and summarized
and the result reflected that there are 4 different countries in the set of the data.
Answer 4. the set of data was processed and the obtained data reflects that there were
12360 different channels that upload video in different categories.
Answer 5. The visualization of channel and countries reflected that the top 3 countries
according to statistics of channels are France, Canada & United States.
Figure 1: Answer 1 to 5
Answer 6. The channel number in GB is found to be the lowest and the number of
channels is 1624.
Answer 7. The channel number in the US is 2207.
Answer 8 and 9. The list is as follows:
4
As mentioned in the assigned task the data of the videos were analyzed and the
answers of the 20 questions were obtained:
Answer 1. From the analysis of the video_id, the total number of uploaded videos in the
set of data is 55885.
Answer 2. The set of data was processed and the obtained data reflects that there are
18 different categories of the uploaded videos.
Answer 3. The publish_country field in the set of data was processed and summarized
and the result reflected that there are 4 different countries in the set of the data.
Answer 4. the set of data was processed and the obtained data reflects that there were
12360 different channels that upload video in different categories.
Answer 5. The visualization of channel and countries reflected that the top 3 countries
according to statistics of channels are France, Canada & United States.
Figure 1: Answer 1 to 5
Answer 6. The channel number in GB is found to be the lowest and the number of
channels is 1624.
Answer 7. The channel number in the US is 2207.
Answer 8 and 9. The list is as follows:
4
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Figure 2: Answer 6-9
Answer 10. The set of data was processed and the obtained data reflects that there are
13 different years of publication of the uploaded videos.
Answer 11. In December, i.e., the last month of the year 2018, a total of 8397 videos
were uploaded on YouTube.
Answer 12. The number of videos that were uploaded in the year 2018 from GB was
found to be maximum.
Answer 13. The time frame that was most common and was used by France, Canada,
and the US is 16:00-16:59 and for GB this frame was different and one hour ahead.
5
Answer 10. The set of data was processed and the obtained data reflects that there are
13 different years of publication of the uploaded videos.
Answer 11. In December, i.e., the last month of the year 2018, a total of 8397 videos
were uploaded on YouTube.
Answer 12. The number of videos that were uploaded in the year 2018 from GB was
found to be maximum.
Answer 13. The time frame that was most common and was used by France, Canada,
and the US is 16:00-16:59 and for GB this frame was different and one hour ahead.
5

Figure 3: Answer 10-13
Answer 14. The most preferred category of videos concluded from the views are Music,
Sports, and film & animations.
Answer 15. The least preferred category of videos concluded from the views are Music,
Sports, and film & animations.
Answer 16. 21 Savage, Offset, Metro Boomin - Ric Flair Drip is the most liked video in
the set of data.
Answer 17. #ProudToCreate: Pride 2018 is the least liked video in the set of data.
Figure 4: Answer 14 to 17
Answer 18. The highest number of videos were uploaded on Saturday and Friday.
Answer 19. The minimum number of videos are uploaded on Monday.
Answer 20. The breakdowns of likes for every month was calculated and it was found
that the maximum no of likes was gained from November 2017 to June 2018 and the
likes from July 2006 to October 2017 were constantly in the same range.
6
Answer 14. The most preferred category of videos concluded from the views are Music,
Sports, and film & animations.
Answer 15. The least preferred category of videos concluded from the views are Music,
Sports, and film & animations.
Answer 16. 21 Savage, Offset, Metro Boomin - Ric Flair Drip is the most liked video in
the set of data.
Answer 17. #ProudToCreate: Pride 2018 is the least liked video in the set of data.
Figure 4: Answer 14 to 17
Answer 18. The highest number of videos were uploaded on Saturday and Friday.
Answer 19. The minimum number of videos are uploaded on Monday.
Answer 20. The breakdowns of likes for every month was calculated and it was found
that the maximum no of likes was gained from November 2017 to June 2018 and the
likes from July 2006 to October 2017 were constantly in the same range.
6
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

Figure 5: Answer 18 to 20
7
7
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Task 3
Figure 6: Advanced Insight
The questions that were used to analyze the data for advanced insight:
1. Display the variation of comment received according to the category.
2. How many videos were removed due to error?
3. How many videos were uploaded in each category?
4. Display the variation of views for each category according to time frames.
5. Display the variation of views received according to the category.
The answer to each question was obtained and the results are as follows:
A1. The maximum number of comments were received on the music, entertainment and
comedy categories and the minimum number of comments were received on the shows,
trailers, and movies.
A2. From the analysis, it was found that a total of two videos were removed due to the
error.
A3. The image shown below displays the number of videos uploaded in each category
of the video:
8
Figure 6: Advanced Insight
The questions that were used to analyze the data for advanced insight:
1. Display the variation of comment received according to the category.
2. How many videos were removed due to error?
3. How many videos were uploaded in each category?
4. Display the variation of views for each category according to time frames.
5. Display the variation of views received according to the category.
The answer to each question was obtained and the results are as follows:
A1. The maximum number of comments were received on the music, entertainment and
comedy categories and the minimum number of comments were received on the shows,
trailers, and movies.
A2. From the analysis, it was found that a total of two videos were removed due to the
error.
A3. The image shown below displays the number of videos uploaded in each category
of the video:
8

A4. The variation of views on different categories according to the time frame is as
follows:
The green line in the graph displays the sum of views on different categories and the
blue bar displays the category and the x-axis have the time frame displayed on it.
A5. The variation of views on the categories was according to user preferences. The
most preferred videos have the most views and the least preferred videos have the
least views (Yu, & Schroeder, 2018).
9
follows:
The green line in the graph displays the sum of views on different categories and the
blue bar displays the category and the x-axis have the time frame displayed on it.
A5. The variation of views on the categories was according to user preferences. The
most preferred videos have the most views and the least preferred videos have the
least views (Yu, & Schroeder, 2018).
9
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

Task 4
The advanced insights were done in order to do advanced research on the set of data in
order to conclude the research and to gather the suggestion for better content
management. From the data insight done the following was the points that can be
concluded:
The graph displayed above shows the number of comments that were received for each
category of video in the set of the data. This allowed the insight of the viewers
comments preference category. More the number of comments, more the
recommendation for the improvement.
The number of views on the video category displays the interest of the viewers in the
category. The category that is most preferred will get the highest amount of views and
the category that is least preferred will get the low amount of views. This displays the
likeness of the video category. The list displayed in the image displays the views on the
different categories (Bou-Franch, & Garcés-Conejos Blitvich, 2014).
The number of videos in each category was obtained for the 4th question of the
advanced insight and it was found that no of videos was more for the categories like
music, entertainment, and sports and the number of videos for categories like news,
politics, auto & vehicles was less. This shows that the categories that have fewer videos
have a smaller number of viewers and the categories that have a greater number of
videos have a large number of videos (Inostroza-Ponta, Berretta, Moscato, & Brusic,
2011).
The image displayed below displays the sum of views and the category according to the
time frame. The categories wise views display the most favorable categories and the
time frame segregation displays in which time frame which kind of category was viewed
and their variation in the respective 24-time frames.
10
The advanced insights were done in order to do advanced research on the set of data in
order to conclude the research and to gather the suggestion for better content
management. From the data insight done the following was the points that can be
concluded:
The graph displayed above shows the number of comments that were received for each
category of video in the set of the data. This allowed the insight of the viewers
comments preference category. More the number of comments, more the
recommendation for the improvement.
The number of views on the video category displays the interest of the viewers in the
category. The category that is most preferred will get the highest amount of views and
the category that is least preferred will get the low amount of views. This displays the
likeness of the video category. The list displayed in the image displays the views on the
different categories (Bou-Franch, & Garcés-Conejos Blitvich, 2014).
The number of videos in each category was obtained for the 4th question of the
advanced insight and it was found that no of videos was more for the categories like
music, entertainment, and sports and the number of videos for categories like news,
politics, auto & vehicles was less. This shows that the categories that have fewer videos
have a smaller number of viewers and the categories that have a greater number of
videos have a large number of videos (Inostroza-Ponta, Berretta, Moscato, & Brusic,
2011).
The image displayed below displays the sum of views and the category according to the
time frame. The categories wise views display the most favorable categories and the
time frame segregation displays in which time frame which kind of category was viewed
and their variation in the respective 24-time frames.
10
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

The image displayed above is the conclusion of the videos that were removed because
of the errors. This shows that the videos that are removed are less in number in
comparison to the total number of videos that were uploaded on YouTube. This displays
that the system is working fine and there is no need to change the upload and
maintenance system of the channels and videos.
11
of the errors. This shows that the videos that are removed are less in number in
comparison to the total number of videos that were uploaded on YouTube. This displays
that the system is working fine and there is no need to change the upload and
maintenance system of the channels and videos.
11

Task 5
From the analysis of the set of data of the videos that were uploaded on YouTube, the
following are the factors that can be suggested to the content manager of the ABC
online multimedia company:
1) The videos in the non-preferred categories should be reduced.
2) The videos in the preferred categories should be increased.
3) The video with a maximum number of views should be categorized in trending
category.
4) The maximum liked videos should be added to suggested categories.
5) The update and maintenance of the system should be done in the time frames
when the number of active users is less.
6) The hours in which the number of users is most active, the network should be
managed and maintained accordingly (Lacefield, 2009).
12
From the analysis of the set of data of the videos that were uploaded on YouTube, the
following are the factors that can be suggested to the content manager of the ABC
online multimedia company:
1) The videos in the non-preferred categories should be reduced.
2) The videos in the preferred categories should be increased.
3) The video with a maximum number of views should be categorized in trending
category.
4) The maximum liked videos should be added to suggested categories.
5) The update and maintenance of the system should be done in the time frames
when the number of active users is less.
6) The hours in which the number of users is most active, the network should be
managed and maintained accordingly (Lacefield, 2009).
12
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide
1 out of 15
Related Documents

Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
Copyright © 2020–2025 A2Z Services. All Rights Reserved. Developed and managed by ZUCOL.