ITECH 2201 Cloud Computing: Big Data Analysis and Applications
VerifiedAdded on 2023/06/11
|8
|1913
|494
Homework Assignment
AI Summary
This document presents a student's solution to the ITECH 2201 Cloud Computing Week 6 workbook assignment focusing on Big Data. The solution covers key areas such as defining data science, understanding the characteristics of big data (the 7 V's), and exploring big data platforms. It discusses how big data can be acquired, organized, and analyzed, and provides examples of Google's data products and their effective use of large-scale data. The solution also addresses the limitations of traditional relational databases for storing big data, introduces NoSQL databases and MapReduce, and lists features of Amazon’s S3 service. Furthermore, it identifies industries that benefit from big data applications and statistical analysis tools used in big data analysis. Desklib provides similar solved assignments and past papers.

ITECH 2201 Cloud Computing
School of Science, Information Technology & Engineering
Workbook for Week 6 (Big Data)
Please note: All the efforts were taken to ensure the given web links are accessible. However,
if they are broken – please use any appropriate video/article and refer them in your answer
Part A (4 Marks)
Exercise 1: Data Science (1 mark)
Read the article at http://datascience.berkeley.edu/about/what-is-data-science/ and
answer the following:
What is Data Science?
Data science is referred to the study which assess the location of a particular information and
how it can be converted into a proper resource for the benefit of businesses and IT strategies.
It helps to analyze patterns that are created from structured and unstructured data.
According to IBM estimation, what is the percent of the data in the world today that has been
created in the past two years?
In the last two years, according to the estimation of IBM over 90% of the data that is present
in the world has been created.
____________________________________________________________________________
What is the value of petabyte storage?
1000000 gigabytes or 10^15 bytes or 1000 terabytes is the value of one petabyte storage.
_______________________________________________________________________
CRICOS Provider No. 00103D Insert file name here Page 1 of 8
School of Science, Information Technology & Engineering
Workbook for Week 6 (Big Data)
Please note: All the efforts were taken to ensure the given web links are accessible. However,
if they are broken – please use any appropriate video/article and refer them in your answer
Part A (4 Marks)
Exercise 1: Data Science (1 mark)
Read the article at http://datascience.berkeley.edu/about/what-is-data-science/ and
answer the following:
What is Data Science?
Data science is referred to the study which assess the location of a particular information and
how it can be converted into a proper resource for the benefit of businesses and IT strategies.
It helps to analyze patterns that are created from structured and unstructured data.
According to IBM estimation, what is the percent of the data in the world today that has been
created in the past two years?
In the last two years, according to the estimation of IBM over 90% of the data that is present
in the world has been created.
____________________________________________________________________________
What is the value of petabyte storage?
1000000 gigabytes or 10^15 bytes or 1000 terabytes is the value of one petabyte storage.
_______________________________________________________________________
CRICOS Provider No. 00103D Insert file name here Page 1 of 8
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.

For each course, both foundation and advanced, you find at
http://datascience.berkeley.edu/academics/curriculum/ briefly state (in 2 to 3 lines) what
they offer? Based on the given course description as well as from the video. The purpose
of this question is to understand the different streams available in Data Science.
The students who have fared well in OOP will get 12 units of coursework in the foundation
courses. The students who are not good in object oriented programming are assigned with 15
coursework.
In the foundation courses include various subjects like applied machine learning, statistics of
data science, data analysis, sorted data engineering.
The advanced courses have several subjects like data visualizations, machine learning at
scale, scaling up big data, casually and experiments, human values and deep learning.
Exercise 2: Characteristics of Big Data (2 marks)
Read the following research paper from IEEE Xplore Digital Library
Ali-ud-din Khan, M.; Uddin, M.F.; Gupta, N., "Seven V's of Big Data understanding Big
Data to extract value," American Society for Engineering Education (ASEE Zone 1), 2014
Zone 1 Conference of the , pp.1,5, 3-5 April 2014
and answer the following questions:
Summarise the motivation of the author (in one paragraph)
The author has tried to explain that big data is the only solution that can solve a number of
problem that are faced by industries nowadays. It is resent everywhere. It has its benefits in
several large as well as small scale businesses, entertainment, law enforcement and film
making. Even huge organizations such as Google and Facebook use this technology. It is
used by Google in several cases such as Hadoop, Mao Reduce and Big table. He has further
explained the significance of the technology in other industries such as finance, politics,
sustainability, biological research and education.
CRICOS Provider No. 00103D Insert file name here Page 2 of 8
http://datascience.berkeley.edu/academics/curriculum/ briefly state (in 2 to 3 lines) what
they offer? Based on the given course description as well as from the video. The purpose
of this question is to understand the different streams available in Data Science.
The students who have fared well in OOP will get 12 units of coursework in the foundation
courses. The students who are not good in object oriented programming are assigned with 15
coursework.
In the foundation courses include various subjects like applied machine learning, statistics of
data science, data analysis, sorted data engineering.
The advanced courses have several subjects like data visualizations, machine learning at
scale, scaling up big data, casually and experiments, human values and deep learning.
Exercise 2: Characteristics of Big Data (2 marks)
Read the following research paper from IEEE Xplore Digital Library
Ali-ud-din Khan, M.; Uddin, M.F.; Gupta, N., "Seven V's of Big Data understanding Big
Data to extract value," American Society for Engineering Education (ASEE Zone 1), 2014
Zone 1 Conference of the , pp.1,5, 3-5 April 2014
and answer the following questions:
Summarise the motivation of the author (in one paragraph)
The author has tried to explain that big data is the only solution that can solve a number of
problem that are faced by industries nowadays. It is resent everywhere. It has its benefits in
several large as well as small scale businesses, entertainment, law enforcement and film
making. Even huge organizations such as Google and Facebook use this technology. It is
used by Google in several cases such as Hadoop, Mao Reduce and Big table. He has further
explained the significance of the technology in other industries such as finance, politics,
sustainability, biological research and education.
CRICOS Provider No. 00103D Insert file name here Page 2 of 8

_______________________________________________________________________
____________________________________________________________________
What are the 7 v’s mentioned in the paper? Briefly describe each V in one paragraph.
The 7Vs of the paper are volume, velocity, value, validity, veracity, volatility and variety. The
volume shows that creation of big data from numerous sources such as research studies,
images, video, text and audio. It can be also taken from government documents, telemetry,
web pages and social media which explains the volatility aspect. The velocity aspect stats
that the system that is handling the big data should have infrastructures that are capable of
processing the big data at high speed. Value and veracity comes from the different use f
clouds and browsers for the big data.
__________________________________________________________________________
Explore the author’s future work by using the reference [4] in the research paper.
Summarise your understanding how Big Data can improve the healthcare sector in 300
words.
Big data can be used in healthcare sector for a number of reasons. It can be used for
personalized medicines and treatments, to prevent the employees from doing unscrupulous
behaviors and for improved treatment service.
_______________________________________________________________________
Exercise 3: Big Data Platform (1 mark)
In order to build a big data platform - one has to acquire, organize and analyse the big
data. Go through the following links and answer the questions that follow the links: Check
the videos and change the wordings
− http://www.infochimps.com/infochimps-cloud/how-it-works/
CRICOS Provider No. 00103D Insert file name here Page 3 of 8
____________________________________________________________________
What are the 7 v’s mentioned in the paper? Briefly describe each V in one paragraph.
The 7Vs of the paper are volume, velocity, value, validity, veracity, volatility and variety. The
volume shows that creation of big data from numerous sources such as research studies,
images, video, text and audio. It can be also taken from government documents, telemetry,
web pages and social media which explains the volatility aspect. The velocity aspect stats
that the system that is handling the big data should have infrastructures that are capable of
processing the big data at high speed. Value and veracity comes from the different use f
clouds and browsers for the big data.
__________________________________________________________________________
Explore the author’s future work by using the reference [4] in the research paper.
Summarise your understanding how Big Data can improve the healthcare sector in 300
words.
Big data can be used in healthcare sector for a number of reasons. It can be used for
personalized medicines and treatments, to prevent the employees from doing unscrupulous
behaviors and for improved treatment service.
_______________________________________________________________________
Exercise 3: Big Data Platform (1 mark)
In order to build a big data platform - one has to acquire, organize and analyse the big
data. Go through the following links and answer the questions that follow the links: Check
the videos and change the wordings
− http://www.infochimps.com/infochimps-cloud/how-it-works/
CRICOS Provider No. 00103D Insert file name here Page 3 of 8

− http://www.youtube.com/watch?v=TfuhuA_uaho
− http://www.youtube.com/watch?v=IC6jVRO2Hq4
− http://www.youtube.com/watch?v=2yf_jrBhz5w
Please note: You are encouraged to watch all the videos in the series from Oracle.
How to acquire big data for enterprises and how it can be used?
Big data can be acquired from several sources such as social media, personal and customer
information. The enterprises can use it to enhance their operational productivity and make it
as a standard to make innovations. It can allow the data researchers to find the potential uses
of big data in gaining models and patterns and use it in parallel to the existing applications
that are used in the enterprises.
_______________________________________________________________________
How to organize and handle the big data?
Big data can be handled and organized by reducing the amount of information that is
accessible to the potential employees. Virtualization can be used as well to organza data
(Blech et al., 2014). The big data should be handled carefully as the amount of information
that is shred to third party applications should be monitored. The virtual data base of the big
data needs to be monitored.
_______________________________________________________________________
What are the analyses that can be done using big data?
A number of analysis can be done with big data. Illegal taxes, deportation, dangers from
hostiles can be analyzed with big data (Fitzgerald & Barenboim, 2017). Information can be
used for doing analysis on customers from YouTube, facebook and twitter and can be used in
media and entertainment. For analyzing several patterns, amazon and Spotify make use of
big data analytics (Chen & Zhang, 2014). It can be used for analyzing blood samples as well
as for cost advantage analysis in medical benefits.
_______________________________________________________________________
CRICOS Provider No. 00103D Insert file name here Page 4 of 8
− http://www.youtube.com/watch?v=IC6jVRO2Hq4
− http://www.youtube.com/watch?v=2yf_jrBhz5w
Please note: You are encouraged to watch all the videos in the series from Oracle.
How to acquire big data for enterprises and how it can be used?
Big data can be acquired from several sources such as social media, personal and customer
information. The enterprises can use it to enhance their operational productivity and make it
as a standard to make innovations. It can allow the data researchers to find the potential uses
of big data in gaining models and patterns and use it in parallel to the existing applications
that are used in the enterprises.
_______________________________________________________________________
How to organize and handle the big data?
Big data can be handled and organized by reducing the amount of information that is
accessible to the potential employees. Virtualization can be used as well to organza data
(Blech et al., 2014). The big data should be handled carefully as the amount of information
that is shred to third party applications should be monitored. The virtual data base of the big
data needs to be monitored.
_______________________________________________________________________
What are the analyses that can be done using big data?
A number of analysis can be done with big data. Illegal taxes, deportation, dangers from
hostiles can be analyzed with big data (Fitzgerald & Barenboim, 2017). Information can be
used for doing analysis on customers from YouTube, facebook and twitter and can be used in
media and entertainment. For analyzing several patterns, amazon and Spotify make use of
big data analytics (Chen & Zhang, 2014). It can be used for analyzing blood samples as well
as for cost advantage analysis in medical benefits.
_______________________________________________________________________
CRICOS Provider No. 00103D Insert file name here Page 4 of 8
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.

Part B (4 Marks)
Part B answers should be based on well cited article/videos – name the references used
in your answer. For more information read the guidelines as given in Assignment 1.
Exercise 4: Big Data Products (1 mark)
Google is a master at creating data products. Below are few examples from Google.
Describe the below products and explain how the large scale data is used effectively in
these products.
a. Google’s PageRank
To enhance the importance of a web page, Page rank is used. It is a product of Google and
can help to increase the traffic in a web page by checking the number and quality of
connections as well assign the pages with different rankings (Pabari, 2014).
b. Google’s Spell Checker
Spellchecker is a product of google that has been used for a long time for checking words
according to the user’s requirements (John Walker, 2014).
c. Google’s Flu Trends
This product was a disaster and is used for assessing the activities of flu in more than 20
countries.
d. Google’s Trends
This product is used for accessing the time taken in searching a term in google wen search.
Like Google – Facebook and LinkedIn also uses large scale data effectively. How?
Facebook, LinkedIn and Google uses big data to analyze user interaction and user behaviors.
_______________________________________________________________________
CRICOS Provider No. 00103D Insert file name here Page 5 of 8
Part B answers should be based on well cited article/videos – name the references used
in your answer. For more information read the guidelines as given in Assignment 1.
Exercise 4: Big Data Products (1 mark)
Google is a master at creating data products. Below are few examples from Google.
Describe the below products and explain how the large scale data is used effectively in
these products.
a. Google’s PageRank
To enhance the importance of a web page, Page rank is used. It is a product of Google and
can help to increase the traffic in a web page by checking the number and quality of
connections as well assign the pages with different rankings (Pabari, 2014).
b. Google’s Spell Checker
Spellchecker is a product of google that has been used for a long time for checking words
according to the user’s requirements (John Walker, 2014).
c. Google’s Flu Trends
This product was a disaster and is used for assessing the activities of flu in more than 20
countries.
d. Google’s Trends
This product is used for accessing the time taken in searching a term in google wen search.
Like Google – Facebook and LinkedIn also uses large scale data effectively. How?
Facebook, LinkedIn and Google uses big data to analyze user interaction and user behaviors.
_______________________________________________________________________
CRICOS Provider No. 00103D Insert file name here Page 5 of 8

Exercise 5: Big Data Tools (2 marks)
Briefly explain why a traditional relational database (RDBS) is not effectively used to store
big data?
RDBS is not used for storing big data due to the size of the information which is petabytes.
Moreover, the sources of data are variable such as image, sounds, videos and more. Also,
big data is created at high speed unlike RDBS.
_______________________________________________________________________
What is NoSQL Database?
It is a tool for assessing the recovery and capacity of data that is used in tabular method for
maintain databases.
___________________________________________________________________
Name and briefly describe at least 5 NoSQL Databases
First is wide columnar where the data is kept in sections. Next is document database where it
is kept in JSON framework. Next, multimodel database. Fourth, graph databases and the last
is key value stores.
___________________________________________________________
What is MapReduce and how it works?
Map reduce is programming tool that takes the data from mappers and exchanges it with
reducers.
____________________________________________________________________
Briefly describe some notable MapReduce products (at least 5)
Some map reduce products are:-
CRICOS Provider No. 00103D Insert file name here Page 6 of 8
Briefly explain why a traditional relational database (RDBS) is not effectively used to store
big data?
RDBS is not used for storing big data due to the size of the information which is petabytes.
Moreover, the sources of data are variable such as image, sounds, videos and more. Also,
big data is created at high speed unlike RDBS.
_______________________________________________________________________
What is NoSQL Database?
It is a tool for assessing the recovery and capacity of data that is used in tabular method for
maintain databases.
___________________________________________________________________
Name and briefly describe at least 5 NoSQL Databases
First is wide columnar where the data is kept in sections. Next is document database where it
is kept in JSON framework. Next, multimodel database. Fourth, graph databases and the last
is key value stores.
___________________________________________________________
What is MapReduce and how it works?
Map reduce is programming tool that takes the data from mappers and exchanges it with
reducers.
____________________________________________________________________
Briefly describe some notable MapReduce products (at least 5)
Some map reduce products are:-
CRICOS Provider No. 00103D Insert file name here Page 6 of 8

Distributed grep
Measure of geographical information
Digital surface as well as terrain models
URL access frequency
Inverted index
___________________________________________________________________
Amazon’s S3 service lets to store large chunks of data on an online service. List some 5
features for Amazon’s S3 service.
The features are:-
1. Vendor support
2. Flexible management
3. In place query
4. Unmatched durability
5. Comprehensive security
_______________________________________________________________________
Getting the concise, valuable information from a sea of data can be challenging. We need
statistical analysis tool to deal with Big Data. Name and describe some (at least 3)
statistical analysis tools.
MS EXCEL is a good statistical analysis tool that provides the advantage of adaptability and
control. It is easily accessible ad does not require extra training to master.
Matlab is another tool that has can perform good information statistics and also offers a low
licensing cost.
SPSS is used for data administration and offers a lot of variety in graphs and diagrams.
______________________________________________________________________
Exercise 6: Big Data Application (1 mark)
Name 3 industries that should use Big Data
Facebook, Google and twitter should big data.
CRICOS Provider No. 00103D Insert file name here Page 7 of 8
Measure of geographical information
Digital surface as well as terrain models
URL access frequency
Inverted index
___________________________________________________________________
Amazon’s S3 service lets to store large chunks of data on an online service. List some 5
features for Amazon’s S3 service.
The features are:-
1. Vendor support
2. Flexible management
3. In place query
4. Unmatched durability
5. Comprehensive security
_______________________________________________________________________
Getting the concise, valuable information from a sea of data can be challenging. We need
statistical analysis tool to deal with Big Data. Name and describe some (at least 3)
statistical analysis tools.
MS EXCEL is a good statistical analysis tool that provides the advantage of adaptability and
control. It is easily accessible ad does not require extra training to master.
Matlab is another tool that has can perform good information statistics and also offers a low
licensing cost.
SPSS is used for data administration and offers a lot of variety in graphs and diagrams.
______________________________________________________________________
Exercise 6: Big Data Application (1 mark)
Name 3 industries that should use Big Data
Facebook, Google and twitter should big data.
CRICOS Provider No. 00103D Insert file name here Page 7 of 8
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

CRICOS Provider No. 00103D Insert file name here Page 8 of 8
1 out of 8
Related Documents

Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
© 2024 | Zucol Services PVT LTD | All rights reserved.