Your All-in-One AI-Powered Toolkit for Academic Success.

+13062052269

info@desklib.com

Available 24*7 on WhatsApp / Email

Company

Tools

Support

Big Data and MapReduce

Verified

Added on 2023/06/11

AI Summary

This article discusses the basics of Big Data, NoSQL databases, and MapReduce. It explains why traditional relational databases are not effective for storing Big Data and introduces NoSQL databases as an alternative. It also describes the concept of MapReduce and how it can be used in Big Data processing.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.

ITECH 2201 Cloud Computing
School of Science, Information Technology & Engineering
Workbook for Week 6 (Big Data)
Please note: All the efforts were taken to ensure the given web links are
accessible. However, if they are broken – please use any appropriate
video/article and refer them in your answer
Part A (4 Marks)
Exercise 1: Data Science (1 mark)
Read the article at http://datascience.berkeley.edu/about/what-is-data-science/ and
answer the following:
What is Data Science?
The study of what information represents, where it comes from and how the information can be
turned into a valuable resource to create IT strategies and businesses is known as data science.
Patterns can be identified by mining huge number of unstructured and structured data that can be
analyzed to help businesses in cost efficiencies, competitive advantage and new market
opportunities.
According to IBM estimation, what is the percent of the data in the world today that has
been created in the past two years?
According to IBM estimation, 90 % percent of the data in the world today that has been created in
the past two years.
What is the value of petabyte storage?
The value of petabyte storage is 10^15 bytes or 1,000,000 gigabytes or 1,000
terabytes
CRICOS Provider No. 00103D Insert file name here Page 1 of 23

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

For each course, both foundation and advanced, you find at
http://datascience.berkeley.edu/academics/curriculum/ briefly state (in 2 to 3 lines)
what they offer? Based on the given course description as well as from the video.
The purpose of this question is to understand the different streams available in
Data Science.
In the foundation courses, students who are good in OOP will get 12 units of the
coursework, whereas the students who are not so good at OOP will get 15 units of course
work.
The foundation course work includes 3 units of python data science, Research design
and application for data and analysis, Statistics for Daa science, fundamental sof data
engineering and applied machine learning.
In the advanced courses, the units include experiments and casualty, behind data
human values, scaling up really big data, statistical methods for discrete response, time
series and panel data, machine learning at scale, natural language processing with deep
learning and data visualization
Exercise 2: Characteristics of Big Data (2 marks)
Read the following research paper from IEEE Xplore Digital Library
Ali-ud-din Khan, M.; Uddin, M.F.; Gupta, N., "Seven V's of Big Data understanding Big
Data to extract value," American Society for Engineering Education (ASEE Zone 1), 2014
Zone 1 Conference of the , pp.1,5, 3-5 April 2014
and answer the following questions.
Summarise the motivation of the author (in one paragraph)
The motivation of the author is the simple fact that big data is everywhere in our lives now
and it is the solution to many problems that are present in the industries now. To build the
technology of the future, the big data has all the right materials. Big data is used in every
aspects if our lives now such as in small and large businesses, film making, law
enforcement and entertainment. It is used by large corporations like Amazon, Facebook
and Google. To really use big data to its full advantage, the wen experience needs to be
CRICOS Provider No. 00103D Insert file name here Page 2 of 23

chosen as the data pool as most of the people nowadays access web and mobile apps
most of the time. The largest generator of data should be Google and it has changed the
market scenario by introducing big data related technology like Map Reduce, Hadoop and
Google Big table. The author has stressed on the fact that big data will help to
revolutionize other industries like biological research, politics, sustainability, environmental
research, finance and education.
What are the 7 v’s mentioned in the paper? Briefly describe each V in one paragraph.
The 7 Vs that are mentioned in the paper are validity, veracity, volume, volatility, variety
and velocity.
The volume aspect of the 7 v’s points to the fact that big data is created from several
sources such as natural disasters, forecasting, weather, crime reports, space images,
medical, research studies, networking, images, social, video, audio and text. The volume
of data can be extracted from social media, GPS trails, government documents, telemetry,
web pages and so on.
The second aspect is velocity. The big data needs to be transferred at an optimum velocity
so that it can be processed. The system taking in the data should have capable
infrastructure to handle the data. The feedback loop should have high speed from the
input to the decision.
The third aspect is the variety of big data that comes from different sources. The data can
be in the form of names, images, text, video and audio. Users use different browsers and
upload their data on different clouds
The third aspect is the .variety. The fourth V is veracity. Next comes validity and value.
Explore the author’s future work by using the reference [4] in the research paper.
Summarise your understanding how Big Data can improve the healthcare sector.
Big data can improve healthcare by
 Personalized treatments and medicines
 Better treatment
 Prevent malicious behaviour
CRICOS Provider No. 00103D Insert file name here Page 3 of 23

Exercise 3: Big Data Platform (1 mark)
In order to build a big data platform - one has to acquire, organize and analyse the big
data. Go through the following links and answer the questions that follow the links: Check
the videos and change the wordings
− http://www.infochimps.com/infochimps-cloud/how-it-works/
− http://www.youtube.com/watch?v=TfuhuA_uaho
− http://www.youtube.com/watch?v=IC6jVRO2Hq4
− http://www.youtube.com/watch?v=2yf_jrBhz5w
Please note: You are encouraged to watch all the videos in the series from Oracle.
How to acquire big data for enterprises and how it can be used?
Big data is being utilized to enhance operational productivity, and the capacity to settle on
proper choices in light of the extremely most recent up-to-the-minute data is quickly turning
into the standard. The main objective of big data is to enable organizations to settle on more
educated business choices by empowering data Scientist, prescient modelers and different
investigation experts to break down extensive volumes of exchange information, and in
addition different types of information that might be undiscovered by ordinary business
intelligence programs.
How to organize and handle the big data?
To organize and handle big data efficiently, the initial step is to convey the information down
to its dataset and lessen the measure of information to be overseen. Next, use the energy of
virtualization innovation (Baldini et al. 2016). Associations must virtualize this novel
informational index with the goal that not just different applications can reuse similar
information impression, yet in addition the littler information impression can be put away on
any seller free stockpiling gadget.
Virtualization is the tool that associations can use to fight the Big Data administration
challenge.
By diminishing the information, virtualizing the reuse and capacity of the information and
unifying the administration of the informational index, Big Data is at last changed into little
information and oversaw like virtual information.
CRICOS Provider No. 00103D Insert file name here Page 4 of 23

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

What are the analyses that can be done using big data?
There are several analyses that can be conducted with the help of big data. Big data
Analytics is utilized by them to break down dangers like hostile to illegal tax avoidance,
extortion relief, know your customer activity. Media and excitement requires constant offer
information to full the expanding requests of the clients in various arrangements and
assortment of gadgets such as billboards, TV, YouTube and some more. Their primary
initiative is to use enormous information and convey an ongoing substance crosswise over
various Medias. Wimbledon supposition examination, Amazon Prime and Spotify are live
illustrations. Social insurance industry is in most need of huge information examination.
They have a tremendous measure of blood test results to exchange information, from
remedies to media discourse. Because of absence of legitimate examination, the
wellbeing segment has dependably neglected to use the information to check the cost and
get medical advantages. Humedica, Obama care, Cerner are a few cases of such
industries.
Part B (4 Marks)
Part B answers should be based on well cited article/videos – name the references used
in your answer. For more information read the guidelines as given in Assignment 1.
Exercise 4: Big Data Products (1 mark)
Google is a master at creating data products. Below are few examples from Google.
Describe the below products and explain how the large scale data is used effectively in
these products.
a. Google’s PageRank
PageRank is the thing that Google uses to decide the significance of a site page. It's
one of numerous components used to figure out which pages show up in list items.
PageRank tries to check a site page's significance. PageRank doesn't stop at the
popularity of the link. It likewise takes a guess at the significance of the page that
contains the connection. Pages with higher PageRank have more weight in "voting"
with their connections than pages with bring down PageRank. It additionally checks the
CRICOS Provider No. 00103D Insert file name here Page 5 of 23

quantity of connections on the page throwing the "vote." Pages with more connections
have less value.
b. Google’s Spell Checker
Google's spell check is an exceptionally old component Google has been always making
strides. Google is utilizing both web file and additionally question preparing calculation
with a specific end goal to choose if the word you write needs a refinement. At times
Google doesn't give the first inquiry any shot whatsoever: It looks for the "right" spelling.
These inquiries may have the most reduced QR.
c. Google’s Flu Trends
Google Flu Trends is presently never again distributing current assessments. Google
operated the service. It gave assessments of flu action to in excess of 25 nations. By
amassing Google Search questions, it tried to make precise assessments about the
activity of influenza.
d. Google’s Trends
Google Trends is a trend searching application that shows how much of the time a given
hunt term is gone into Google's web index with respect to the website's aggregate pursuit
volume over a given timeframe. It can be utilized for near watchword inquire about and to
find occasion activated spikes in volumes of keywords.
It gives watchword related information including look volume list and geological data
about internet searcher clients.
Like Google – Facebook and LinkedIn also uses large scale data effectively. How?
LinkedIn, Facebook and Google use large scale data by analysing user behaviour and
interations. Theb large scale data is then analysed to find patterns.
Exercise 5: Big Data Tools (2 marks)
Briefly explain why a traditional relational database (RDBS) is not effectively used
to store big data?
CRICOS Provider No. 00103D Insert file name here Page 6 of 23

RDBS is not usually used for storing big data due to the following reasons.
To begin with, the size of the data has expanded hugely to the scope of petabytes—one
petabyte. RDBMS thinks that its testing to deal with such immense information volumes.
To address this, RDBMS included more CPUs or more memory to the database
administration system to scale up vertically.
Second, most of the information arrives in a semi-organized or unstructured arrangement
from online networking, sound, video, messages, and messages. Be that as it may, the
second issue identified with unstructured information is outside the domain of RDBMS in
light of the fact that social databases can't arrange unstructured information. They're
composed and organized to suit organized information, for example, weblog sensor and
money related information.
Big data is created at a high speed. RDBMS needs in high speed since it's intended for
relentless data maintenance as opposed to fast development. Regardless of whether
RDBMS is utilized to deal with and store "huge information," it will end up being extremely
costly. Thus, the powerlessness of relational databases to deal with big data prompted the
rise of new technologies.
What is NoSQL Database?
NoSQL Database gives an tool to measure capacity and recovery of information that is
demonstrated in tabular fashion utilized as a part of social databases. It is a way to deal
with database plan that can accommodate a wide assortment of information models,
including graph formats columnar and key value.
Name and briefly describe at least 5 NoSQL Databases
Five NoSQL databases are mentioned as follows:-
 Wide column (Cassandra, HBase) - Information is put away in sections rather than
columns as in a traditional SQL framework. Any number of segments (and
consequently a wide range of sorts of information) can be assembled or totaled as
required for questions or information sees.
 Document databases ( MongoDB, CouchDB) - Embedded data is put away as
JSON structures or archives where the information could be anything from numbers
CRICOS Provider No. 00103D Insert file name here Page 7 of 23

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

to strings to free content. There is no intrinsic need to indicate what fields, assuming
any, a report will contain.
 Multi model databases like Cosmos DB and Orient DB.
 Graph databases - Information is provided to as a system or chart of substances
and their connections, with every hub in the diagram a freestyle lump of information.
Example, Neo4j.
 Key-value stores ( Riak, Redis) - from basic numbers or strings to complex JSON
reports—are gotten to in the database by method for keys.
What is MapReduce and how it works?
MapReduce is a paradigm of programming that takes into consideration huge versatility
over hundreds or thousands of servers in a Hadoop bunch. It works by rearranging which
is the procedure by which middle information from mappers are exchanged to at least 0,1
reducers. Every reducer gets at least 1 keys and its related qualities relying upon the
quantity of reducers (for an adjusted load). Facilitate the qualities related with each key
are privately arranged.
Briefly describe some notable MapReduce products (at least 5)
Some products and applications of MapReduce are mentioned as follows:-
 Distributed grep is utilized to scan for a given example in an expansive number of
records. For instance, a web manager can utilize conveyed grep to look web server
sign so as to locate the best asked for pages that match a given example
 With the innovative progressions in area based administrations, there is a colossal
surge in the measure of geospatial information. Geospatial questions (closest
neighbor inquiries and invert closest neighbor questions) devour parcel of
computational assets and it is watched that their preparing in characteristically
parallelizable.
 Digital Elevation Models are advanced or 3D portrayal of the scene, where every (X,
Y) position is spoken to by a solitary rise esteem. DEMs are additionally alluded by
the name Digital Terrain Model (DTM) or Digital Surface Model (DSM). A DEM can
be spoken to as a raster (a matrix of squares) or as a triangular unpredictable
CRICOS Provider No. 00103D Insert file name here Page 8 of 23

system (TIN), and can be created from remotely detected (utilizing satellites) or
specifically overviewed arrive height data.
 Count of URL Access Frequency - The guide work forms logs of site page demands
and yields <URL, 1>. The decrease work includes all qualities for a similar URL and
produces a <URL, add up to count> combine.
 Inverted Index - The guide work parses each record, and radiates an arrangement
of <word, report ID> sets.
Amazon’s S3 service lets to store large chunks of data on an online service. List
some 5 features for Amazon’s S3 service.
The features are:-
 Unmatched durability
 Comprehensive security
 In place query
 Flexible management
 Vendor support
Getting the concise, valuable information from a sea of data can be challenging. We
need statistical analysis tool to deal with Big Data. Name and describe some (at
least 3) statistical analysis tools.
Three statistical tools have been described as follows:-
MS Excel brings a wide assortment of devices for perception and measurable examination
of your physiological information. Information import from content documents is as basic
as producing rundown measurements and adjustable illustrations and figures.
Advantages:-
• It offers a great deal of control and adaptability.
• It is generally accessible and moderately modest for understudies and private
substances.
CRICOS Provider No. 00103D Insert file name here Page 9 of 23

• It doesn't require to learn new techniques for controlling information and drawing
diagrams.
MATLAB is a framework of general analysis , which requires programming abilities to a
substantially more prominent degree than Excel or SPSS.
Advantages:-
• MATLAB offers specific tool kits for the investigation of information coming from eye
following, EEG, ECG, EMG and so forth and outward appearance examination.
• In MATLAB, examination, handling steps and results can be totally altered.
• It offers scholarly licenses at a lessened cost.
SPSS is searching programming, including measurements, measurable and non-
measurable test effectiveness. Plots of SPSS are ordinarily found in scholarly papers and
business explore reports.
Advantages:-
• SPSS has a productive data administration and offers a considerable measure of control
over information association.
• It offers an extensive variety of techniques, diagrams and graphs.
• SPSS verifies that the result is kept separate from the information itself, producing very
much organized reports and worksheets containing comes about.
Exercise 6: Big Data Application (1 mark)
Name 3 industries that should use Big Data
Google, Facebook and twitter should use big data.
From your lecture and also based on the below given video link:
https://www.youtube.com/watch?v=_sXkTSiAe-A
Write a paragraph about memory virtualization.
CRICOS Provider No. 00103D Insert file name here Page 10 of 23

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Memory virtualization permits organized servers to share a pool of memory to
conquer physical memory impediments, a typical bottleneck in programming
execution. The memory pool might be gotten to at the application level or
working framework level. At the application level, the pool is gotten to through an
API or as an organized record framework to make a rapid shared memory store.
At the working framework level, a page reserve can use the pool as an extensive
memory asset that is significantly quicker than neighborhood or arranged
stockpiling. Memory virtualization usage are recognized from shared memory
frameworks.
Watch the below mentioned YouTube link:
https://www.youtube.com/watch?v=wTcxRObq738
Based on the video answer the following questions:
What is RAID 0?
Disk striping or RAID 0 is a procedure that splits up a document and spreads the
information over all the plate drives in a RAID gathering. The advantage of RAID
0 is that it enhances execution. Since striping spreads information crosswise
over more physical drives, various circles can get to the substance of a
document, enabling composes and peruses to be finished all the more rapidly. A
disadvantage to RAID 0 is that it doesn't have equality. In the event that a drive
ought to flop, there is no repetition and all information would be lost.
Describe Striping, Mirroring and Parity.
Disk striping is a strategy in which numerous littler plates go about as a solitary
substantial circle. The procedure separates substantial information into
information pieces and spreads them over different stockpiling gadgets. Disk
striping gives the upside of amazingly huge databases or extensive single-table
table space utilizing just a single consistent gadget.
Data mirroring includes the continuing task of duplicating information from one
area to a nearby or remote stockpiling medium. In short, a mirror is a precise of
a dataset. Most regularly, it is utilized when numerous precise of information are
required in different areas.
Parity drive is a hard drive utilized as a part of a RAID exhibit to give adaptation
to non-critical failure. For instance, RAID 3 utilizes it to make a framework that is
both blame tolerant and, in view of information striping, quick. The XOR of the
greater part of the information drives in the RAID cluster is composed to the
parity drive.
CRICOS Provider No. 00103D Insert file name here Page 11 of 23

Exercise 2: Storage Design (2 marks)
Summarize storage repository design based on the following video link:
https://www.youtube.com/watch?v=eVQH7C3nulY
In the mentioned video, a storage repository on a LUN is connected to a
grouped server pool, as a result of the idea of the OCFS2 document framework
it employments. Thus, a server pool must exist with grouping empowered, and
no less than one server must be available in the clustered environement. Local
server storage with a repository additionally has a place in this classification,
since nearby circles are constantly found as LUNs.
Below YouTube link describes the Intelligent Storage System
https://www.youtube.com/watch?v=raTIRsMi7zk
Based on the watched video answer the following questions:
What is ISS?
Storage Arrays include arrays of rich RAID that give exceptionally streamlined
I/O handling abilities are by and large alluded as Intelligent Storage Arrays or
Intelligent Storage Systems. These stockpiling frameworks have the ability to
meet the necessities of the present I/O concentrated cutting edge applications.
These applications require abnormal amounts of execution, accessibility,
security, and versatility. Along these lines, to meet the necessities of the
applications numerous sellers of clever stockpiling system currently support
SSDs, deduplication, compression, architecture and encryption.
What are the 4 main components of the ISS?
The front end gives the interface between host and the storage. It comprises of
two parts: front-end ports and front-end controllers. The front-end ports empower
hosts to interface with the canny stockpiling framework. Each front-end port has
preparing rationale that executes the proper transport convention, for example,
SCSI, iSCSI and Fibre Channel, for capacity associations.
Cache is a critical part that upgrades the I/O execution in a canny stockpiling
framework. Store is semiconductor memory where information is set incidentally
to lessen the time required to benefit I/O asks for from the host. Cac enhances
stockpiling framework execution by disconnecting has from the mechanical
deferrals related with physical plates, which are the slowest segments of a clever
stockpiling framework.
The back end gives an interface amongst store and the physical circles. It
comprises of two parts: back-end ports and back-end controllers. The back end
CRICOS Provider No. 00103D Insert file name here Page 12 of 23

controls information exchanges amongst store and the physical circles. From
reserve, information is sent to the back end and afterward directed to the goal
plate.
The fourth part is physical circles.
How cache works in ISS?
ISS systems control the distribution, administration, and utilization of capacity
assets for quicker information handling. These capacity frameworks keep
running with a lot of stored memory and complex calculations to meet the I/O
requests of even the critical applications. Cache is used in stockpiling
frameworks makes it speedier to recover information after the primary hunt.
Each keeps in touch with the reserve memory is put away in two diverse
memory stockpiling and memory cards and comprises of the label RAM and the
information store. The RAM tracks the area of the information in the physical
locations while the information store holds the information that is being
composed or perused. The copy serves as a backup if the store fails in a single
area. These procedures accelerate the IO forms and significantly diminishes the
quantity of mechanical disk activities.
Storage Area Network (SAN) and Network Attached Storage (NAS) are widely
used concepts in data storage arena. The following YouTube video links gives
detailed description of these concepts:
− http://www.youtube.com/watch?v=csdJFazj3h0
− http://www.youtube.com/watch?v=vdf6CvGQZrk
− https://www.youtube.com/watch?v=KxdfGcynfJ0
− https://www.youtube.com/watch?v=4RsLUTJ_Qtk
Based on the watched videos answer the following questions:
Describe NAS and SAN briefly using diagrams?
NAS is a committed server utilized for record stockpiling and sharing. NAS is a
hard drive joined to a system, utilized for capacity and got to through an
appointed system address. It works like a server for document sharing yet does
not permit different administrations (like messages or confirmation). It permits
the expansion of more storage room to accessible systems notwithstanding
when the framework is shutdown amid maintenance.
SAN is a high speed system that gives block level system access to capacity.
SANs are normally made out of switches, hosts, capacity components, and
capacity gadgets that are interconnected utilizing an assortment of
advancements, topologies, and conventions. SANs may likewise traverse
various locales.
CRICOS Provider No. 00103D Insert file name here Page 13 of 23

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

What are the advantages of SAN over NAS?
SAN is a devoted system of storage devices (can incorporate tape drives, hard
drive and more) all working together to give a great level stockpiling. While NAS
is a solitary gadget/server/figuring apparatus, sharing its own particular
stockpiling over the system.
What are two common NAS file sharing protocols? How they are different
from each other?
Two common NAS file sharing protocols are NFS and AFP.
The Network File System (NFS) is a customer/server application that gives a PC
client a chance to see and alternatively store and refresh records on a remote
PC just as they were without anyone else PC. The protocol is one of a few
conveyed document framework norms for NAS.
AFP is the local record and printer sharing convention for Macs and it
strengthens numerous extraordinary Mac characteristics that are not upheld by
different conventions. So for the best execution, and 100% similarity, AFP ought
to be utilized.
Part B (3 Marks)
Exercise 3: Storage Design (1 Mark)
Design Storage Solution for New Application
Scenario
An organization is deploying a new business application in their environment.
The new application requires 1TB of storage space for business and application
data. During peak workload, application is expected to generate 4900 IOPS (I/O
per second) with typical I/O data block size of 4KB.
The vendor available disk drive option is 15,000 rpm drive with 100 GB capacity.
Other specifications of the drives are:
CRICOS Provider No. 00103D Insert file name here Page 14 of 23

Average Seek time = 5 millisecond and data transfer rate = 40 MB/sec.
You are required to calculate the required number of disk drives that can meet
both capacity and performance requirements of an application.
Hint: In order to calculate the IOPS from average seek time, data transfer rate,
disk rpm and data block size refer slide 28 in week 6 lecture slide. Once you
have IOPS, refer slide 29 in week 6 to calculate the required number of disks.
Exercise 4: Storage Evolution (2 Marks)
Watch the following videos for Fiber Channel over Ethernet and answer the
questions that follow:
− http://www.youtube.com/watch?v=hSFyf-rmjA8
− http://www.youtube.com/watch?v=iCfJCzfNLrw
What is FCoE and why we need FCoE?
Fiber Channel over Ethernet or FCoE is a protocol that empower Fiber Channel
communication to run specifically designed Ethernet. It makes it conceivable to
move Fiber Channel activity crosswise over existing rapid Ethernet foundation
and merges stockpiling and IP conventions onto a solitary link transport and
interface. Fiber Channel bolsters fast connections of data between registering
gadgets that interconnect servers with shared capacity gadgets and between
capacity controllers and drives. FCoE shares Fiber Channel and Ethernet
movement on the same physical link or gives companies a chance to isolate
Fiber Channel and Ethernet activity on a similar equipment.
In your opinion how FCoE is cost effective than traditional connection –
give brief explanation.
The switches gives an outline for FCoE that can convey connector to-change to-
connector inactivity of under 10 microseconds and port-to-port idleness of
roughly 3 microseconds, free of parcel measure. They incorporate ports at the
back for consistency with server farm servers, permitting shorter and less difficult
link keeps running inside racks and decreasing expense and copper.
You have read and answered about SAN in part A – based on your
understanding and with some research effort answers the following questions:
CRICOS Provider No. 00103D Insert file name here Page 15 of 23

What is a Virtual SAN?
Virtual SAN is a product characterized storage from VMware that empowers
companies to pool their capacity abilities and to in a flash and naturally
arrangement virtual machine stockpiling by means of basic strategies that are
driven by the virtual machine.
What is IP SAN protocols and FibreChannel over IP (FCIP)?
An IP SAN is a particular SAN network or storage area network that enables
different servers to get to pools of shared square stockpiling gadgets utilizing
capacity conventions that rely upon the Internet Engineering Taskforce
standard Internet Protocol suite. FCIP, is a protocol used to interface Fiber
Channel switches over an IP arrange, empowering interconnection of remote
areas. From the texture see, a FCIP interface is a between ISL that vehicles
FC control and information outlines between switches.
Watch the below video about Introduction to Object-based and Unified Storage
and:
https://www.youtube.com/watch?v=kl9X6mzEWO4
Choose the correct answer from the following questions:
What is an advantage of a flat address space over a hierarchical address
space?
a. Highly scalable with minimal impact on performance
b. Provides access to data, based on retention policies
c. Provides access to block, file, and object with same interface
d. Consumes less bandwidth on network while accessing data
What is a role of metadata service in an OSD node?
a. Responsible for storing data in the form of objects
b. Stores unique IDs generated for objects
c. Stores both objects and objects IDs
d. Controls functioning of storage devices
CRICOS Provider No. 00103D Insert file name here Page 16 of 23

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

What is used to generate an object ID in a CAS system?
a. File metadata
b. Source and destination address
c. Binary representation of data
d. File system type and ownership
What accurately describes block I/O access in a unified storage?
a. I/O traverse NAS head and storage controller to disk
b. I/O traverse OSD node and storage controller to disk
c. I/O traverse storage controller to disk
d. I/O is directly sent to the disk
What accurately describes unified storage?
a. Provides block, file, and object-based access within one platform
b. Provides block and file storage access using objects
c. Supports block and file access using flat address space
d. Specialized storage device purposely built for archiving
What is Greenhouse effect?
The greenhouse effect expands the temperature of the Earth by catching heat
in our climate. This keeps the temperature of the Earth higher than it would be
if direct heat from the Sun was the main source of warming.
We are legally, ethically, and socially required to green our IT products,
applications, services, and practices – is this statement true? Why?
Yes it is true because it is our duty to create a future for our generations where
they can live happily in a sustainable way.
What is Green IT and what are the benefits of greening IT?
CRICOS Provider No. 00103D Insert file name here Page 17 of 23

Green IT is the act of naturally supportable figuring. It plans to limit its negative
effect activities on nature by planning, fabricating, working and discarding PCs
and PC related items in an ecologically well disposed way.
Exercise 2: Environmental Sustainability (0.5 Marks)
Read the article in the below link and answer the questions that follow:
http://www.computer.org/csdl/mags/it/2010/02/mit2010020004.html
According to the article how do you build a greener environment?
Green environment can be built by including power administration; server farm
plan, format, and area; the utilization of biodegradable materials; administrative
consistence; green measurements and green metrics; carbon impression
appraisal devices and philosophy; and environment related chance alleviation.
Summarize the article in 150 words
In the article , the significant advancements enhancing the vitality proficiency of
PCs, virtualization, server farm outline and task, control mindful programming
have been mentioned. Nonetheless, there are a few Green IT zones that request
additionally innovative work: innovation selection, ecological effect evaluation,
models and direction, and saddling IT for natural supportability. To manufacture
a greener situation, we should adjust or end a portion of our old and
recognizable methods for getting things done. To thoroughly and adequately
address IT's ecological effect, we should receive an all-encompassing
methodology and make the whole IT life cycle.
Exercise 3: Environmentally Sound Practices (1 Mark)
The questions in this exercise can be answered by doing internet search.
Briefly explain the following terms – a paragraph for each term:
 Power usage effectiveness (PUE) and its reciprocal
PUE or Power Usage Effectiveness is a measure used to decide proficiency of energy
estimations. It is figured by estimating the proportion of aggregate energy utilizing the
server combination in addition to the cooling to "valuable" energy use.
 Data center efficiency (DCE)
CRICOS Provider No. 00103D Insert file name here Page 18 of 23

DCE is a more viable approach to evaluate the energy utilization for investigating the
viable utilization of energy through existing IT hardware, with respect to the
peroformance of that device.
 Data center infrastructure efficiency (DCiE)
DCiE is a metric used to decide the energy effectiveness of a data center.
It was introduced by Green Grid, an industry gather concentrated on dat center vitality
proficiency. It, is computed by partitioning IT hardware control by add up to office control.
List 5 universities who offers Green Computing course. You should name the
university, the course name and the brief description about the course.
The universities which provide the courses on green computing are written as
follows:-
 University of Hertfordshire, Foundation and strategies, green computing
evaluation
 University of Cambridge , Advanced IT, about the benefits of green
computing
 UMass Amherst College of Engineering, Computing architecture
 University of Victoria, Green IT
 Karlstad University, Foundation and strategies
Exercise 4: Major Cloud APIs (1 Mark)
The following companies are the major cloud service provider: Amazon, GoGrid,
Google, and Microsoft.
List and briefly describe (3 lines for each company) the Cloud APIs
provided by the above major vendors.
Google Cloud Platform, offered by Google, is a suite of cloud computing services
that keeps running on a similar system that Google utilizes inside for its end-
client items, for example, Google Search and YouTube. Amazon Web Services
(AWS) is a safe cloud administrations stage, offering figure control, database
stockpiling, content conveyance and other usefulness to enable organizations to
scale and develop (Lu et al. 2013). The AWS Cloud gives a wide arrangement of
foundation administrations, for example, processing power, stockpiling choices,
CRICOS Provider No. 00103D Insert file name here Page 19 of 23

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

systems administration and databases, conveyed as utility Microsoft Azure is a
consistently growing arrangement of cloud administrations to enable your
association to address your business difficulties. It is the opportunity to
manufacture, oversee and send applications on an enormous, worldwide system
utilizing your most loved instruments and structures.
GoGrid is a cloud infrastructure service, facilitating Linux and Windows virtual
machines managed by a multi-server API.
Part B (3 Marks)
Exercise 1: Greening IT Standards and Regulations (0.5 Marks)
To design green computers and other IT hardware – the following
standards and regulations are mainly used EPEAT (www.epeat.net), the
Energy Star 4.0 standard, and the Restriction of Hazardous Substances
Directive (https://www.gov.uk/guidance/rohs-compliance-and-guidance).
Use the link provide with some internet search – summarize each
standards and regulations in 150 words.
The Electronic Product Environmental Assessment Tool (EPEAT) is a simple to-
utilize asset for buyers, makers, affiliates and others needing to discover or
advance electronic items with positive natural qualities. It was produced utilizing
EPA and is overseen by the Green Electronics Council (GEC). GEC keeps up
EPEAT's site and item registry and furthermore reports the natural advantages
coming about because of the buy of EPEAT-enlisted items.
Vitality STAR is government supported image for energy effectiveness, giving
straightforward and unrestricted data that buyers and organizations depend on
to settle on very much educated choices. A great many mechanical, business,
utility, state, and nearby associations including 40 percent of the Fortune 500
depend on their organization with the U.S. Ecological Protection Agency to
convey cost-sparing vitality productivity arrangements.
RoHS Directive is an arrangement of criteria planned by the European Union to
control the utilization of harmful materials in electrical and electronic gadgets,
frameworks, and toys. The Directive is effective from 1 July, 2006.
Exercise 2: Green cloud computing (0.5 Marks)
Xiong, N.; Han, W.; Vandenberg, A, "Green cloud computing schemes based on
networks: a survey," Communications, IET, vol.6, no.18, pp.3294,3300, Dec. 18 2012
Most part of power consumption in data centers comes from computation processing, disk
storage, network and cooling systems. Nowadays, there are new technologies and
methods proposed to reduce energy cost in data centers. From the above paper
summarize (in 300 words) the recent work done in these fields.
CRICOS Provider No. 00103D Insert file name here Page 20 of 23

Virtualization technology allows one to create several Virtual Machines on a physical
server reduces amount of hardware in use and improves the utilization of resources.
Organizations can outsource their computation needs to the Cloud, thereby eliminating the
necessity to maintain own computing infrastructure.
Data center power consumption and cooling are two of the biggest energy issues that
confront IT organizations today. Cooling systems consume nearly half of the electricity
energy of data centers. Using this hot water cooling, chillers are no longer required year-
round, that means the data-center energy consumption can be reduced by up to 50%. And
more attractively, direct utilization of the collected thermal energy becomes feasible, either
using synergies with district heating or specific industrial applications.
Exercise 3: Cloud API Functionalities (2 Marks)
List the functionalities that can be achieved by using the APIs mentioned
in the following link:
https://code.google.com/p/sainsburys-nectar-api/
The list of functionalities are written as follows:-
 Migration from one application to the next
 Central database
 Data security in case of system failure
What API is used in the following link and how it is used?
https://pypi.python.org/pypi/python-novaclient
•
OpenStack Compute API has been used in the given link. Through this API, the
administration gives hugely versatile, on request, self-benefit access to process
assets. Contingent upon the sending those process assets may be Virtual
Machines, Physical Machines or Containers.
Openstack is an open source collaborative software project which meets many
of the cloud needs. Below links gives vast information about Openstack.
 https://support.rc.nectar.org.au/docs/openstack
 http://docs.openstack.org/api/quick-start/content/
 Write a report (1 page) about the Openstack features and
functionalities.
Write a report (1 page) about the Openstack features and functionalities
OpenStack is an arrangement of software tools for managing and building cloud computing
systems for private and public systems. Supported by a portion of the greatest organizations
CRICOS Provider No. 00103D Insert file name here Page 21 of 23

in software hosting, and in addition a huge number of individual group individuals, numerous
surmise that it is the eventual fate of distributed computing. It is overseen by the OpenStack
Foundation, a non-benefit that manages both improvement and group working around the
undertaking. It gives clients a chance to send virtual machines and different cases that handle
diverse undertakings for dealing with a cloud situation on the fly. It makes flat scaling simple,
which implies that errands that advantage from running simultaneously can without much of a
stretch serve increasingly or less clients on the fly by simply turning up more occasions. For
instance, an application that requirements to speak with a remote server may have the
capacity to partition crafted by speaking with every client crosswise over a wide range of
cases, all speaking with each other however scaling rapidly and effortlessly as the application
acquires clients. In particular, OpenStack is open source programming, which implies that
any individual who jars get to the source code, roll out any improvements or adjustments they
require, and uninhibitedly share these progressions pull out to the group on the loose. It
additionally implies that OpenStack has the advantage of thousands of designers everywhere
throughout the world working pair to build up the most grounded, most vigorous, and most
secure item that they can. It is comprised of a wide range of moving interfaces. As a result of
its open nature, anybody can add extra parts to OpenStack to assist it with meeting their
necessities. OpenStack Image gives revelation, enlistment, and conveyance administrations
for circle and server pictures. Put away pictures can be utilized as a layout. It can likewise be
utilized to store and index a boundless number of reinforcements. The Image Service can
store plate and server pictures in an assortment of back-closes, including Swift. OpenStack
Object Storage is an adaptable repetitive stockpiling system. Records are composed to
various area drives spread all through servers in the data center with the OpenStack
programming in charge of guaranteeing information replication.
CRICOS Provider No. 00103D Insert file name here Page 22 of 23

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

References
Baldini, I., Castro, P., Cheng, P., Fink, S., Ishakian, V., Mitchell, N., ... & Suter, P. (2016,
May). Cloud-native, event-based programming for mobile applications. In Proceedings of the
International Conference on Mobile Software Engineering and Systems (pp. 287-288). ACM.
Lu, Q., Zhu, L., Bass, L., Xu, X., Li, Z., & Wada, H. (2013, June). Cloud API issues: an
empirical study and impact. In Proceedings of the 9th international ACM Sigsoft conference
on Quality of software architectures (pp. 23-32). ACM.
CRICOS Provider No. 00103D Insert file name here Page 23 of 23

1 out of 23

+13062052269

info@desklib.com

Big Data and MapReduce

Contribute Materials

Secure Best Marks with AI Grader

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Related Documents

Big Data Characteristics, Tools and Applications - ITECH 2201 Cloud Computing

Big Data Workbook for Week 6 - ITECH 2201 Cloud Computing

ITECH 2201 Cloud Computing School of Science

What is Data Science? Part B Exercise 1: What is Data Science?

Big Data in Production Optimization

Green Computing, Big Data, and Storage Design: A Workbook for Week 6-8