ITECH 2201: Cloud Computing - Big Data Workbook Assignment, Week 6

Verified

Added on  2021/05/31

|24
|7358
|489
Homework Assignment
AI Summary
This document presents a comprehensive solution to a Week 6 Big Data assignment within the ITECH 2201 Cloud Computing course. The assignment delves into key aspects of data science, including its definition, applications, and the impact of big data. It explores the characteristics of big data, such as velocity, variety, volume, veracity, volatility, value, and validity, referencing academic research. The solution also examines big data platforms, including data acquisition, organization, and analysis techniques, supported by video resources. Furthermore, it analyzes Google's data products like PageRank and Spell Checker, highlighting how large-scale data is used effectively. The assignment also discusses the limitations of traditional relational databases (RDBMS) in handling big data and introduces NoSQL databases. The solution incorporates cited articles and videos to support its findings, providing a detailed understanding of big data concepts and their practical implications in cloud computing.
Document Page
ITECH 2201 Cloud Computing
School of Science, Information Technology & Engineering
Workbook for Week 6 (Big Data)
Please note: All the efforts were taken to ensure the given web links are accessible.
However, if they are broken – please use any appropriate video/article and refer them in
your answer
CRICOS Provider No. 00103D Insert file name here Page 1 of 24
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Part A
Exercise 1: Data Science
Read the article at http://datascience.berkeley.edu/about/what-is-data-science/ and answer the following:
What is Data Science?
Data science basically a topic that refers to the significance and organization of data avalanche creation in
recent years. It allows the identification of patterns and patterns of data, and allows people with advanced
scholarships to improve the conditions in which humanity creates social plus business value. The
appearance of the "bid data" also enables us to comprehend these phenomena more deeply, ranging from
biological systems and economic behavior 1 to human social entities.
According to IBM estimation, what is the percent of the data in the world today that has been created
in the past two years?
It is measured or estimated that ninety percent of the world's data in last two years has been completed by
IBM.
What is the value of petabytestorage?
Million gigabytes also written as (10 to 15th power) is peta-byte.
For each course, both foundation and advanced, you find at
http://datascience.berkeley.edu/academics/curriculum/briefly state (in 2 to 3 lines) what they offer?
Based on the given course description as well as from the video. The purpose of this question is to
understand the different streams available in Data Science.
Foundation course:
Foundation course or basic curriculum is n essential skills and knowledge that students provide in the
data science. It includes storing, searching, designing, and analysing of research work in data science
provide students with data visualization and practical application knowledge (Khan, Fahim Uddin &
Gupta, 2014).
Advanced course:
Advanced course plays an important role in deep understanding and value and application of the data
science. Analytical method comprises complex skills that address big data-related issues through
CRICOS Provider No. 00103D Insert file name here Page 2 of 24
Document Page
experimental design and data visualization to help students explore and make them aware of the exact
usage of data science.
Exercise 2: Characteristics of Big Data
Read the following research paper from IEEE Xplore Digital Library
Ali-ud-din Khan, M.; Uddin, M.F.; Gupta, N., "Seven V's of Big Data understanding Big Data to
extract value," American Society for Engineering Education (ASEE Zone 1), 2014 Zone 1 Conference
of the , pp.1,5, 3-5 April 2014 and answer the following questions: Summarise the motivation of the
author (in one paragraph)
As the author has described, it comes from the fact that BD is emphasized because it now become important
part of life and also hides solutions to any industry problem. The main reason for this paper is that they
think big data is the main area of technology. In addition, it is written for "BD Ocean". As we all know,
billions of statistics are generated every day, making big data as a style.
What are the 7 v’s mentioned in the paper? Briefly describe each V in one paragraph.
1) Velocity: Velocity is discussed from two perspectives. Basic thing is incoming of data that enterprise
needs to prepare the technology plus database engine processes. The other is to move big data to a large
storage area that needs a quick response when the data arrives.
2) Variety: It includes diverse shapes, such as video, text, which is a main difference between big data as
well as traditional data. The challenging part is due to complexity that can lead to erroneous data
integration.
3) Volume: Volume means size of information or data created from some sources including audio, text,
video, research reports, spatial images, social networks, weather forecasts, crime reports to mention.
4) Veracity: Compared with traditional data, it focuses on the reliability of data because it can be
standardized. These big data come directly from users. The reliability of these users is low. Therefore,
cleaning up data is an important step for big data.
5) Volatility: When considering big data, volatility means data retention strategy. This is easily executed
in a relational database furthermore can expand the type, speed, and amount of data in the big data
world.
6) Value: Value is a significant V value because it is an ideal result of big data analysis and is also the
result of previous analysis.
7) Validity: Validity means accuracy of the data and correct usage and data is real and does not want to be
effective in dissimilar situations (Corea, 2016).
Explore the author’s future work by using the reference [4] in the research paper. Summarise
your understanding how Big Data can improve the healthcare sector in 300 words.
As it has been stated that the cost or ownership and management data will be exceeded. The governance
mechanism depends to a large extent on the value of the data. For structures and strategies, it is required to
CRICOS Provider No. 00103D Insert file name here Page 3 of 24
Document Page
write and execute truth limit of project information extraction simultaneously. Data can be between layers,
in short, there is less risk of data at higher levels. Therefore, it is recognized that there are higher storage
costs and higher levels of protection to ensure these levels are related to costs. Advent of digitalized
technology has provided many benefits for healthcare suppliers. One of the key advances is the utilizes of
big information in medical business. Utilizing big data may help medical industry participants provides
more effective operations moreover insight into patient as well as their well being. Healthcare business
faces a variety of challenges, from a new disorder outbreak to maintain optimal operational efficiencies.
The Big data analytic may also help solve these health care challenges. Utilizing a large amount of
information in healthcare industry, such as clinical, financial, development and research, operational data,
and management, The Big Data may gain meaningful insight and improve operational effectiveness of the
business. “Healthcare companies can lower medical costs and provide better services Finding ways to treat
diseases: Some drugs seem to works for several peoples, however not other, furthermore there are the
various things to observe in single genome. It is impossible to learn all of these learning’s in detail,
however big data may help reveal unknown correlation, hidden pattern, and insight also by examine huge
amounts of information. In future, it can be used to create special drugs for the patient's human genome to
obtain the best therapeutic effect. Combining all patients' electronic health records, dietary information,
social factors, etc. with DNA sequencing can recommend customized treatment and personalized medicine.
Aurora Health Care has begun a proof of concept for this, and they have been able to reduce the
readmission rate by 10% and save $6 million annually (Abouelmehdi, Beni-Hessane & Khaloufi, 2018).
Exercise 3: Big Data Platform
In order to build a big data platform - one has to acquire, organize and analyse the big data. Go through the
following links and answer the questions that follow the links: Check the videos and change the wordings
http://www.infochimps.com/infochimps-cloud/how-it-works/
http://www.youtube.com/watch?v=TfuhuA_uaho
http://www.youtube.com/watch?v=IC6jVRO2Hq4
http://www.youtube.com/watch?v=2yf_jrBhz5w
Please note: You are encouraged to watch all the videos in the series from Oracle.
How to acquire big data for enterprises and how it can be used?
From the video mentioned as well as Oracle's article the main change to infrastructure are the procurement
phase. These 2 major use cases must be consider. First, for the social media update, forum comment and
blogs, companies can simply remove analysis of overnight or weekly trends. Want to update, study, also
store information for online profile moreover continue to monitor sensor. In case, the NoSQL database may
be use to store a big data but it is extensible and flexible. Even the Hadoop distributed system files may be
use for batch information. In this method, the system aims to capture all information by not parsing data and
categorizing it in fixed mode. As a result, data can be easily accessed through simple keys and customer-
based applications.
CRICOS Provider No. 00103D Insert file name here Page 4 of 24
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
How to organize and handle the big data?
Stored data in the HDFS want to be a pre-processed, well organized, and converted so that it may be loaded
into information warehouse using traditional enterprise data and data store in NoSQL. It moreover knows
that BD is always in different formats. Procedure called sessions are for specific information. This
procedure translates behaviour patterns and other related information into useful data so after that it may be
aggregated as well as loaded into the relational database systems.
What are the analyses that can be done using big data?
Big data analysis is complete in distributed surroundings because big data analysed in some deeper
analysis, i.e. due to the required infrastructure, data mining and statistical analysis of various systems for
storing various data. Zooming can be done on large amounts of data. Analytical models can make better
decisions automatically. Finally, the response time driven in changing behaviour can be delivered faster
(Jee & Kim, 2013).
CRICOS Provider No. 00103D Insert file name here Page 5 of 24
Document Page
Part B (4 Marks)
Part B answers should be based on well cited article/videos – name the references used in your answer.For
more information read the guidelines as given in Assignment 1.
Exercise 4: Big Data Products (1 mark)
Google is a master at creating data products. Below are few examples from Google. Describe the below
products and explain how the large scale data is used effectively in these products.
a. Google’s PageRank
In 2005, the Google began to link Google's webmasters and blogs as "votes," a new attribute called a
link to unfollow, which is a countermeasure against spam. The hyperlink page correspond to a single
page vote, and a voting page is obtained through the significance of all linked pages. If there is no
linked page, the page may have greater number of relations or no hierarchy.
b. Google’s Spell Checker
This spell checker are used to spell words. It is a standalone application. It is called electronic dictionaries,
search engine, word processor furthermore email customers. This spellchecker are used to separate words
when comparing during stem analysis.
c. Google’s Flu Trends
This trend of Google Flu are the web services operated by the Google that provide estimate of influenza
activities in 25 and more than that countries. It estimates available historical information and present
research information for download (Kościelniak & Puto, 2015).
d. Google’s Trends
Google Trends are Google search-based web-based tool. When search terms are entered in different
languages for search in different regions of the world, they are usually displayed as search terms.
Like Google – Facebook and LinkedIn also uses large scale data effectively. How?
This is well-known facts, that generates a huge amount of information in website, because they are the
social platform, moreover all of these information’s should be recognized against the user's behaviour
pattern to get the recommendation. Such as, Face book are use for various activities that provide suggestion
that the users need to purchase or attends that is likely to be post on the page with explore criteria.
Exercise 5: Big Data Tools
Briefly explain why a traditional relational database (RDBS) is not effectively used to store big data?
CRICOS Provider No. 00103D Insert file name here Page 6 of 24
Document Page
According to a XYZ there is the 3 major reason why RDBS are not efficiently use to store a big data.
Initially, size of data drastically increased within the PB level, and the ability to process such a large amount
of RDBS data was a tedious task. Most of the RDBS data is unstructured or semi-structured. This
information frequently comes from a social media, texts, videos, emails. The Unstructured information is
beyond the scope of the RDBS because relational database cannot identified in unstructured information.
The RDBS are design for structured data or financial data for blog sensors. The high speed of big data is
another reason for information retention instead off rapid development (Hoskins, 2014).
What is NoSQL Database?
The NoSQL database are defined as "a basically distributed system and a non-relational database that also
enables quick ad hoc organizations to analyse extremely high volumes and different data type. It also
known as a cloud database, a big data database because it has a huge number of Data generation, storage,
etc. Another name are non-relational database.
Name and briefly describe at least 5 NoSQL Databases
Cassandra: Face book originally developed Cassandra, as well as then developed Apache open sources
projects, which are ideal for social networking CC databases. This is a non-relational database that is used
in conjunction with Google's Big Table.
Lucene: This is one of the ASF 4 Jakarta Task Forces in Jakarta. It is an open source tool for
subprojects or full-text search engine toolkits. It is not a full-text search engine, but a full-text search
engine architecture.
Oracle's NoSQL database: The big data machine are NoSQL database, integrated Hadoop, , an R system
language, a Hadoop loader, and an Oracle database and Hadoop adapter. It released Oracle OpenWorld as a
big data appliances on 4th October.
HBase: Called the Hadoop database, it provides high-performances, column-oriented, highly reliable
furthermore scalable storage systems that are distributed through HBase technologies. The stored structure
is cluster type of the computer server. The Google BigTable start source implementations are done on
HBase because it is the same as the BigTable's files storage systems.
BigTable are non-relational database: they contains a multidimensional classification map for a storage, it
is sparse, persistent as well as distributed. Because PB-level information processing are done on several
machines, it is very reliable (Bughin, 2016).
What is MapReduce and how it works?
MapReduce are used for a parallel computing of several data sets because it is a model of programming. It
helps programmers run on distributed systems just like parallel programming. Map functions are performed
for a set of key-value pair to map the new set of pairs specify on the concurrent reduction functions to make
sure that each shared key are mapped to a similar set of keys for current software.
Briefly describe some notable MapReduce products (at least 5)
Couchdb: This is an Apache open source database software that focuses on how to use and build a scalable
architecture.
CRICOS Provider No. 00103D Insert file name here Page 7 of 24
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Apache Hadoop: It is big data open source software for MapReduce programming framework, it is scalable
cloud computing.
Disco Project: This is a lightweight distributed computing system and an open source framework.
Riak: This is a scalable, easy-to-use, easy-to-use NoSQL database that is also distributed.
Infinispan: Software developed by Red Hat for key NoSQL and distributed cache data storage (Vis, 2013).
Amazon’s S3 service lets to store large chunks of data on an online service. List some 5 features for
Amazon’s S3 service.
Amazon's S3 service has the following features, as described below:
Version Control: It allows each object in the bucket to save, retrieve, and retrieve each version. It is used to
improve the dependability of storage as well as recover deleted or overwritten objects.
Life cycle: Objects using the life cycle will be automatically deleted and marked as a glacier storage at a
specific time.
This tag marks the cost allocation like AWS billing aspect to easily track AWS costs and organize bucket tags.
Request pricing refers to the behaviour of using a store and accessing objects in a folder to get a list of files for
all actions that AWS charges. Price is an important factor to consider when dealing with a large number of
documents.
RRS is reduced, and redundant storage can be enabled and disabled in the storage to decrease the cost of
reproducible data in a non-critical manner.
Getting the concise, valuable information from a sea of data can be challenging. We need statistical
analysis tool to deal with Big Data. Name and describe some (at least 3) statistical analysis tools.
Some statistical analysis tools are:
[R]: R is a programming language for navigating the command line interface. It also uses circuits to
perform R functions in a complex computer science environment, making it accurate and able to learn
faster. R can run on various operating systems.
EXCEL spreadsheet: It is one of the Microsoft Office products and it is a powerful software. These tables
and charts are easy to operate and manage. It is also used for data analysis and statistical analysis, which is
due to the deficiencies caused by the slow operation.
SPSS Statistics: SPSS Statistics does not require extensive programming knowledge. In addition to the
syntax editor, there is a point-and-click graphical interface. It is an IBM statistical tool for analysis. It has
some control over the statistical output.
Exercise 6: Big Data Application (1 mark)
Name 3 industries that should use Big Data – justify your claim in 250 words for each industry using
proper references.
Financial industry: From the perspective of existing customers, use investment characteristics, asset
management, banking services, product financial strategies, etc. to formulate customer demographic
segmentation and data analysis of insurance demographics to provide one-stop financial customer solutions.
Get the most value. It is used to manage duplicate transactions in the workflow. Blockchains are used to
improve big data security, consistent compliance archiving and Blockchain analysis.
CRICOS Provider No. 00103D Insert file name here Page 8 of 24
Document Page
Insurance: This is one of the industries that need services and can reduce the time to process complex
claims within 10 minutes. It also needs to eliminate millions of dollars in leaks and fraud. It is also a
customer-centric profitable company. Another important use is to set office premiums because they set the
profit price of premiums by covering risk to suit the customer's budget. This industry is also based on the
principle of risk.
Retail industry: This is a data-driven cognitive technology used to increase the customer experience. It is
also used to analyze social media data to improve product design and marketing to provide quality services.
Big data analytics in the retail process can predict demand products, identify interested customers and
research best forecasting trends, and optimize pricing to deal with the competitive advantage of the products
to be sold.
From your lecture and also based on the below given video link:
https://www.youtube.com/watch?v=_sXkTSiAe-A
Write a paragraph about memory virtualization.
Memory virtualization separates volatile random access memory (RAM) resources from
individual systems in the data center and then aggregates these resources into any computer-
available virtualized memory pool in the cluster. Operating system or application running on the
operating system. The distributed memory pool can then be used as a cache for CPU or GPU
applications, messaging layers, or large shared memory resources. Memory virtualization allows
networked servers and distributed servers to share memory pools to overcome physical memory
limitations, a common bottleneck for software performance. By integrating this functionality
into the network, applications can use a large amount of memory to improve overall
performance, system utilization, improve memory usage efficiency, and enable new use cases.
The software on the memory pool node (server) allows nodes to connect to memory pools to
contribute memory and store and retrieve data.
Watch the below mentioned YouTube link:
https://www.youtube.com/watch?v=wTcxRObq738
Based on the video answer the following questions:
What is RAID 0?
RAID 0, are also known as the disk striping, are the technique for decomposing files and
distributing data across the all disk drive in the RAID group. One disadvantage of a RAID 0 are
that it has no parity. If the drive fails, there will be no redundancy as well as all data will be
misplaced.
Describe Striping, Mirroring and Parity.
CRICOS Provider No. 00103D Insert file name here Page 9 of 24
Document Page
Striping are the very confuse RAID level for beginners and requirements to be well understood
and explained. RAIDS are collection of various disks also define the number of consecutively
addressable disk block in these disks. These disk blocks are called As a set of strips and these
strips, multiple disks are called strips.
Mirroring is easy to understand and one of the main reliable information protection methods. In
this method, you only need to make a copy of the disk image you want to protect, and in this
way get two copies of the data.
Parity: Mirroring involves high costs, so to protect data, the new technology uses a strip called
parity. This is a reliable and low-cost data protection solution. In this method, extra HDDs or
disks are added to the stripe width to save the parity bits.
Exercise 2: Storage Design (2 marks)
Summarize storage repository design based on the following video link:
https://www.youtube.com/watch?v=eVQH7C3nulY
The repositories are essentially logical disk space provided by file systems on the top of the
physical storage space hardware. If repositories are created on file servers, such as NFS share,
the file system exists already; if repositories are created on LUN, OCFS2 system file are first
created. Before you begin the configuration, you must reach an NFS-based repository and a
LUN-based repository.
Below YouTube link describes the Intelligent Storage System
https://www.youtube.com/watch?v=raTIRsMi7zk
Based on the watched video answer the following questions:
What is ISS?
SSS is the element-rich RAID arrays that will provides a highly optimize I / O processing’s
capabilities. It also provides plenty of caching and different I/O methods to improve
performance. The ISS operating environment also provides brilliant cache organization, array
asset administration, and connecting heterogeneous host. It supports virtual provisioning, flash
drives as well as automatic storage tiring.
What are the 4 main components of the ISS?
The Video have been always mentioned 4 main mechanism of front end, cache, ISS, physical
disks and back end.
CRICOS Provider No. 00103D Insert file name here Page 10 of 24
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Storage Area Network (SAN) and Network Attached Storage (NAS) are widely used concepts in
data storage arena. The following YouTube video links gives detailed description of these
concepts:
http://www.youtube.com/watch?v=csdJFazj3h0
http://www.youtube.com/watch?v=vdf6CvGQZrk
https://www.youtube.com/watch?v=KxdfGcynfJ0
https://www.youtube.com/watch?v=4RsLUTJ_Qtk
Based on the watched videos answer the following questions:
Describe NAS and SAN briefly using diagrams?
Store Area Network (SAN) is a high performance network, the primary purpose of which is to
communicate with the computer system with storage devices.
CRICOS Provider No. 00103D Insert file name here Page 11 of 24
Document Page
Network Attached Storage (NAS) is a specialized file storage device that provides file-based
shared storage for local area network (LAN) nodes over standard Ethernet connections. (Rouse,
2015)
The SAN organizes storage resources on a separate, high-performance network. The key difference
between NAS and SAN is that network attached storage handles single file input/output (I/O)
requests, while the storage area network manages sequential data block I/O requests.
What are the advantages of SAN over NAS?
The major advantages of the NAS:
Support inclusive access to data
Improve the efficiency
Improve the flexibility
Centralize storage
Simplify administration
Scalability
The major advantages of the SAN:
Good disk uses
SAN for tragedy recovery for various applications
For improve the availability of the application
CRICOS Provider No. 00103D Insert file name here Page 12 of 24
Document Page
SAN can also reduce the backup time
What are two common NAS file sharing protocols? How they are different from each other?
There are the 2 general NAS file sharing protocols:
• general Internet File System Protocol
• Network File System Protocol
Implement CIFS in the Microsoft environment base on server messages block protocols and NFS in
the UNIX environments.
CRICOS Provider No. 00103D Insert file name here Page 13 of 24
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Part B
Exercise 3: Storage Design (1 Mark)
Design Storage Solution for New Application
Scenario
Organizations are deploying new company application in Organization environment. New
application needs 1TB of the storage for the application data and business. During the peak
workload period, the application is expected to generating 4900 IOPS with a distinctive I/O size
of block 4KB. The Vendor-supplied disk drive options are rpm 15,000 drives with a ability of
100 GB. The specification of drive is: The Average search time are equal to 5 ms, information
transfer rate is equal to 40 MB/sec. You are required to calculate the required number of disk
drives that can meet both capacity and performance requirements of an application (Stonebraker,
2010).
Hint: In order to calculate the IOPS from average seek time, data transfer rate, disk rpm and
data block size refer slide 15 in week 7 lecture slide. Once you have IOPS, refer slide 16 in
week 7 to calculate the required number of disks.
Dc=1TB/100GB=10 disks
Ts=0.005s+0.5/(15,000rmp/60)+4kb/40mb=0.0071s
S=1/0.71=141
0.7S=99
Dp=4900/99=50 disks
Therefore, required disks drives are 50 disks.
CRICOS Provider No. 00103D Insert file name here Page 14 of 24
Document Page
Exercise 4: Storage Evolution (2 Marks)
Watch the following videos for Fiber Channel over Ethernet and answer the questions that
follow:
http://www.youtube.com/watch?v=hSFyf-rmjA8
http://www.youtube.com/watch?v=iCfJCzfNLrw
What is FCoE and why we need FCoE?
The Fibre Channel over Ethernet storage protocol enables Fibre Channel communications to
operate directly over Ethernet (Rouse, 2012). The emergence of FCoE supports the transmission
of existing high-speed Ethernet infrastructures, and centralizes IP protocols on memory and
Fibre Channel to a single cable transmission and interface.
The goal of FCoE is to unify (I / O) and reduce the complexity of the switch. It can also reduce
the number of links and interfaces. In addition, using FCoE can also promote sustainability by
reducing energy use, cooling requirements, and saving users money.
In your opinion how FCoE is cost effective than traditional connection – give brief explanation.
• Traditional connections require multiple network adapters and multiple Ethernet systems. But
with FCoE, each server requires only one adapter and one Ethernet system.
• Ability to reduce environmentally friendly equipment through energy, space, power and
cooling systems;
• Simplified maintenance procedures due to reduced system and equipment (Garg, 2016).
You have read and answered about SAN in part A – based on your understanding and with some
research effort answers the following questions:
What is a Virtual SAN?
Virtual SAN is a software-defined storage product provided by VMware that allows enterprises to
share storage capabilities and instantly provision virtual machine storage through simple virtual
machine-driven policies.
What is IP SAN protocols and FibreChannel over IP (FCIP)?
IP SAN protocol:
An IP SAN is a dedicated storage area network (SAN) that allows multiple servers to access a
pool of shared block storage devices using storage protocols that rely on the Internet
Engineering Taskforce standard Internet protocol suite.
FCIP: Fibre Channel over IP (FCIP) is an important technology for linking Fibre Channel
storage area networks (SANs). FCIP and iSCSI are complementary solutions that enable
company-wide storage access. FCIP transparently interconnects Fibre Channel (FC) SAN
islands over IP networks, while iSCSI allows IP-connected hosts to access iSCSI or FC-attached
storage.
Watch the below video about Introduction to Object-based and Unified Storage and:
CRICOS Provider No. 00103D Insert file name here Page 15 of 24
Document Page
https://www.youtube.com/watch?v=kl9X6mzEWO4
Choose the correct answer from the following questions:
What is an advantage of a flat address space over a hierarchical address space?
a. Highly scalable with minimal impact on performance
b. Provides access to data, based on retention policies
c. Provides access to block, file, and object with same interface
d. Consumes less bandwidth on network while accessing data
What is a role of metadata service in an OSD node?
a. Responsible for storing data in the form of objects
b. Stores unique IDs generated for objects
c. Stores both objects and objects IDs
d. Controls functioning of storage devices
What is used to generate an object ID in a CAS system?
a. File metadata
b. Source and destination address
c. Binary representation of data
d. File system type and ownership
What accurately describes block I/O access in a unified storage?
a. I/O traverse NAS head and storage controller to disk
b. I/O traverse OSD node and storage controller to disk
c. I/O traverse storage controller to disk
d. I/O is directly sent to the disk
What accurately describes unified storage?
a. Provides block, file, and object-based access within one platform
b. Provides block and file storage access using objects
c. Supports block and file access using flat address space
d. Specialized storage device purposely built for archiving
CRICOS Provider No. 00103D Insert file name here Page 16 of 24
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
CRICOS Provider No. 00103D Insert file name here Page 17 of 24
Document Page
What is Greenhouse effect?
Heating of the earth’s atmosphere due to an increase in gases like carbon dioxide.
We are legally, ethically, and socially required to green our IT products, applications,
services, and practices – is this statement true? Why?
True.
Because we have the responsibility to maintain a healthy environment for the world in which we live, and to
maintain a healthy environment for our future generations.
What is Green IT and what are the benefits of greening IT?
The Green IT is practices of environmental maintenance. It realizes energy conservation and
maintenance functions through design, management control and delivery, while reducing the
environmental burden.
Advantages: energy saving, environmental protection, low radiation, recyclable (Gomes, Tolosana-
Calasanz & Agoulmine, 2015).
Exercise 2: Environmental Sustainability (0.5 Marks)
Read the article in the below link and answer the questions that follow:
http://www.computer.org/csdl/mags/it/2010/02/mit2010020004.html
According to the article how do you build a greener environment?
1. According to the article how do you build a greener environment?
2. Coordinate, redesign moreover optimize supply chain, manufacturing actions and organizational
workflows to minimize the impact on the environment;
3. Make company operations, building and further systems energy capable;
4. Analyze, model also simulate environmental impact;
5. Provide a platform for ecological administration as well as emission trading;
6. Audit and report energy utilization and also savings;
7. Providing environmental knowledge administration system, decision support system furthermore
environmental ontology; and
8. Integrate and aggregate environmental monitor network data (Gomes, Tolosana-Calasanz & Agoulmine,
2015).
Summarize the article in 150 words
The article also discussed how the global warming, how it affects the surroundings, how to
establish green IT surroundings to prevent the greenhouse effects, moreover finally discussed
the development prospect of the green IT business.
CRICOS Provider No. 00103D Insert file name here Page 18 of 24
Document Page
Exercise 3: Environmentally Sound Practices (1 Mark)
The questions in this exercise can be answered by doing internet search.
Briefly explain the following terms – a paragraph for each term:
Power usage effectiveness (PUE) and its reciprocal
The power usage effects are use to determine information centre energy effectiveness measurement
methods. Service effectiveness of power supply are divided into total power of the data centre separated
by the computer infrastructure use to run data centre

Data center efficiency (DCE)
DCE are to increase resource use and eliminate the unused ability
Data center infrastructure efficiency (DCiE)
DCE will also increase the resource utilization as well as eliminates unused capacities
CRICOS Provider No. 00103D Insert file name here Page 19 of 24
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
List 5 universities who offers Green Computing course. You should name the university, the course name
and the brief description about the course.
1. Green Computing - University of the Hertfordshire. - Environmental performances in energy as well as
water waste transportation, production, sustainable procurement, environmental awareness and
biodiversity management.
2. University of a Sydney - Green IT and Cloud Computing. Solve various issues related to CC (and
information centre) technology.
3. The ICT sustainability and Australian National University. ICT sustainability is about how to reduce
and assess carbon footprint moreover the material which are utilized by the PC and the
telecommunications. The Strategies may reduce impact of the computers on surroundings as well as
make businesses energy-efficient.
4. University of Brandeis –Green Computing. It is the learner-led efforts to reduce the energy utilize and
the carbon emission, also to raise the awareness of environmental impacts of the PC use at the Brandeis
University.
5. Carnegie Mellon University - Green Computing. This courses are also introduce the students to exciting
areas of the "Green Computing" also divides them into 2 tracks. First track are "Energy Saving
Calculations" and second track are "Applying Calculations to Sustainability (Khan, Shah & Nusratullah,
2015)."
Exercise 4: Major Cloud APIs (1 Mark)
The following companies are the major cloud service provider: Amazon, GoGrid, Google, and
Microsoft.
List and briefly describe (3 lines for each company) the Cloud APIs provided by the above
major vendors.
The AdSense API enables you to integrate AdSense signup, ad unit management, and
reporting into your web or blog hosting platform.
Google’s at no cost AdWords API services allows developers to design computer program
that will directly interact with AdWords servers.
Google also Checkout API enable merchant to integrate their existing e-commerce system
with the Google Checkout, the communicate order category to buyers, along with take
benefits features offered by service.
CRICOS Provider No. 00103D Insert file name here Page 20 of 24
Document Page
Part B (3 Marks)
Exercise 1: Greening IT Standards and Regulations (0.5 Marks)
To design green computers and other IT hardware – the following standards and regulations are
mainly used EPEAT (www.epeat.net), the Energy Star 4.0 standard, and the Restriction of
Hazardous Substances Directive (https://www.gov.uk/guidance/rohs-compliance-and-guidance).
Use the link provide with some internet search – summarize each standards and regulations in
150 words.
Standards and related certification and labeling plans are key features of products and services
in a sustainable supply chain. The number of "green" schemes available in the last few years has
increased rapidly and includes only 400 environmental labels. Nevertheless, standards and
regulations are powerful tools for running green growth strategic frameworks and sustainable
development goals (SDGs) because they encourage improvements in energy efficiency,
emission standards, production market competition, source utilization, trade and foreign direct
investment and private sector volunteer initiatives . By giving consumers information about
products and production processes and providing clear policy signals for businesses, these tools
can be effective in achieving environmental objectives and facilitating the best practices of
sustainable goods and services markets.
Exercise 2: Green cloud computing (0.5 Marks)
Xiong, N.; Han, W.; Vandenberg, A, "Green cloud computing schemes based on networks: a
survey," Communications, IET, vol.6, no.18, pp.3294,3300, Dec. 18 2012
Most piece of energy utilization in server farms originates from calculation handling, plate stockpiling,
system and cooling frameworks. These days, there are new advancements and strategies proposed to
decrease vitality cost in server farms. From the above paper outline (in 300 words) the current work done in
these fields. The creators are especially mindful that green distributed computing (GCC) is a wide range and
hot region. The contrast amongst 'client' and 'cloud-based vitality assets supplier' can be critical for GCC's
worldwide wide biological system generation. A client presents an administration demand to a cloud
specialist organization with an Internet association or wired/remote system. The asked for benefit is come
back to the client in time, however data stockpiling and preparing, interoperating conventions, benefit
design, correspondence and disseminated PCs are effortlessly intelligent through all systems (Smith, 2014).
Exercise 3: Cloud API Functionalities (2 Marks)
List the functionalities that can be achieved by using the APIs mentioned in the following link:
https://code.google.com/p/sainsburys-nectar-api/
Retrieve account details currency, name, currency value, points balance, account type moreover
exceptions)
Retrieval Offer (Discount ID, Offer Information, Validity Period, Valid Period to)
How to choose a discount (discount)
What API is used in the following link and how it is used?
https://pypi.python.org/pypi/python-novaclient
OpenStack username and password
CRICOS Provider No. 00103D Insert file name here Page 21 of 24
Document Page
--os-username, --os-password and --os-tenant-name
export OS_USERNAME=openstack
export OS_PASSWORD=yadayada
export OS_TENANT_NAME=myproject
Define the authentication url
--os-compute-api-version
export OS_AUTH_URL=http://example.com:8774/v2/
export OS_COMPUTE_API_VERSION=2
Keystone - export OS_AUTH_URL=http://example.com:5000/v2.0/
CRICOS Provider No. 00103D Insert file name here Page 22 of 24
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Openstack is an open source collaborative software project which meets many of the cloud needs.
Below links gives vast information about Openstack.
https://support.rc.nectar.org.au/docs/openstack
http://docs.openstack.org/api/quick-start/content/
Write a report (1 page) about the Openstack features and functionalities.
The first is control. Open source platforms mean that there are no specific bindings and limitations,
moreover modular designs may bring integration of traditional in addition to third-party technologies to
gather their company requirements. IT projects are provided by the cloud computing, and IT teams may
become their personal cloud Compute service providers, and building as well as maintaining open sources
private CC are not for each business; however if you use developers and infrastructure, this would be the
best choice (Khan, Shah & Nusratullah, 2015).
Followed by the compatibility. Its public cloud compatibilities allows companies to easily migrate data and
applications in the future based on public cloud strategies for security, economy, and other mission-critical
business standards.
The third is scalability. Currently mainstream Linux operating systems, including Fedora, The SUSE will
also support it.
Fourth is flexibility. The Flexibility are the one of its greatest strengths. Users can build their infrastructure
according to their needs and can easily enlarge cluster size for you.
Fifth is industry standard. Over 10 countries from leading global companies, including more than a 60
countries including Dell, Cisco, Microsoft, and Intel participated in the project
Sixth is the practice testing. The Practice is sole criterions for the testing truth. It’s operating systems that
have been verified by global operation of large public as well as private cloud technologies ("Green
Computing and its Applications in Different Fields", 2017).
CRICOS Provider No. 00103D Insert file name here Page 23 of 24
Document Page
References
Abouelmehdi, K., Beni-Hessane, A., & Khaloufi, H. (2018). Big healthcare data: preserving security and
privacy. Journal Of Big Data, 5(1). doi: 10.1186/s40537-017-0110-7
Bughin, J. (2016). Big data, Big bang?. Journal Of Big Data, 3(1). doi: 10.1186/s40537-015-0014-3
Corea, F. (2016). Can Twitter Proxy the Investors' Sentiment? The Case for the Technology Sector. Big Data
Research, 4, 70-74. doi: 10.1016/j.bdr.2016.05.001
Garg, P. (2016). A green step towards computing: Green cloud computing. International Journal Of Research
Studies In Computing, 5(2). doi: 10.5861/ijrsc.2016.1518
Gomes, D., Tolosana-Calasanz, R., & Agoulmine, N. (2015). Introduction to special issue on Green Mobile
Cloud Computing (Green MCC). Sustainable Computing: Informatics And Systems, 8, 37. doi:
10.1016/j.suscom.2015.11.002
Green Computing and its Applications in Different Fields. (2017). International Journal Of Recent Trends In
Engineering And Research, 3(2), 185-189. doi: 10.23883/ijrter.2017.3023.6yhea
Hoskins, M. (2014). Common Big Data Challenges and How to Overcome Them. Big Data, 2(3), 142-143. doi:
10.1089/big.2014.0030
Jee, K., & Kim, G. (2013). Potentiality of Big Data in the Medical Sector: Focus on How to Reshape the
Healthcare System. Healthcare Informatics Research, 19(2), 79. doi: 10.4258/hir.2013.19.2.79
Khan, M., Fahim Uddin, M., & Gupta, N. (2014). Seven V’s of Big Data Understanding Big Data to extract
Value. Retrieved from http://asee-ne.org/proceedings/2014/Professional%20Papers/113.pdf
Khan, N., Shah, A., & Nusratullah, K. (2015). Adoption of Virtualization in Cloud Computing. International
Journal Of Green Computing, 6(1), 40-47. doi: 10.4018/ijgc.2015010104
Kościelniak, H., & Puto, A. (2015). BIG DATA in Decision Making Processes of Enterprises. Procedia
Computer Science, 65, 1052-1058. doi: 10.1016/j.procs.2015.09.053
Smith, B. (2014). Green computing. Boca Raton, Fla: CRC Press.
Stonebraker, M. (2010). SQL databases v. NoSQL databases. Communications Of The ACM, 53(4), 10. doi:
10.1145/1721654.1721659
Vis, F. (2013). A critical reflection on Big Data: Considering APIs, researchers and tools as data makers. First
Monday, 18(10). doi: 10.5210/fm.v18i10.4878
Wrox. (2011). Professional NoSQL.
CRICOS Provider No. 00103D Insert file name here Page 24 of 24
chevron_up_icon
1 out of 24
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]