Green Computing, Big Data, and Storage Design: A Workbook for Week 6-8
VerifiedAdded on 2023/06/04
|12
|2466
|122
AI Summary
This workbook covers topics such as green computing, big data, and storage design. It includes exercises on data science, the 7Vs of big data, NoSQL databases, and cloud APIs. The workbook also provides information on universities offering green computing courses and major cloud APIs.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
Title
Author
Professor
University
City, State
Date
Author
Professor
University
City, State
Date
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
2613629_1516318593_WorkBookforWeek61
Exercise 1
Data science refers to the collection of data, processing it in order to extract value from it and
visualizing the processed data (Berkeley, 2018).
According to IBM, an estimate of 2.5 quintillion of data is generated daily. Since 2016, about
90% of the world’s data has been created on the internet. The number of internet users grew from
2.4 billion in 2014 to 3.8 billion in 2017. (Micro Focus Blog, 2017).
The media today requires large storage for editing, production and archiving due to the
increasing resolution of images being taken. A petabyte equals to 1,024 terabytes. As volumes of
data increases, the need for storage also increases thus the need for petabyte storage (Manovich,
2011).
The course offers seven foundation, seven advanced courses and a synthetic capstone course.
Students with knowledge in object oriented programming take less unit courses compared with
students without knowledge in object oriented programming.
Exercise 2
The author wants to clarify what big data is and the need to explain the 7th V, value, and its
importance to the health industry. The author also explains the challenges and issues raised by
value to an industry.
The 7Vs
Big data can be defined using the 7Vs velocity, volume, veracity, visualization, variety, value
and variability. Volume refers to how much data is contained. Velocity refers to the speed at
which the data can be accessed. Variety describes the different forms in which data could exist.
Visualization comprises of the use of charts to aid visualize the data. Veracity ensures data
accuracy. Variability refers to the inconsistency of data. Value refers to the importance of data.
Impact on healthcare
Exercise 1
Data science refers to the collection of data, processing it in order to extract value from it and
visualizing the processed data (Berkeley, 2018).
According to IBM, an estimate of 2.5 quintillion of data is generated daily. Since 2016, about
90% of the world’s data has been created on the internet. The number of internet users grew from
2.4 billion in 2014 to 3.8 billion in 2017. (Micro Focus Blog, 2017).
The media today requires large storage for editing, production and archiving due to the
increasing resolution of images being taken. A petabyte equals to 1,024 terabytes. As volumes of
data increases, the need for storage also increases thus the need for petabyte storage (Manovich,
2011).
The course offers seven foundation, seven advanced courses and a synthetic capstone course.
Students with knowledge in object oriented programming take less unit courses compared with
students without knowledge in object oriented programming.
Exercise 2
The author wants to clarify what big data is and the need to explain the 7th V, value, and its
importance to the health industry. The author also explains the challenges and issues raised by
value to an industry.
The 7Vs
Big data can be defined using the 7Vs velocity, volume, veracity, visualization, variety, value
and variability. Volume refers to how much data is contained. Velocity refers to the speed at
which the data can be accessed. Variety describes the different forms in which data could exist.
Visualization comprises of the use of charts to aid visualize the data. Veracity ensures data
accuracy. Variability refers to the inconsistency of data. Value refers to the importance of data.
Impact on healthcare
Big data is used to support research. Companies such as Genome Health Solutions and GNS
healthcare use genomics to transform translational research. Healthcare could be revolutionized
by the ability to perform real-time analysis on data in motion. Converting unstructured data to
structured data makes the information machine manageable.
With the help of mobile phones, patients can access medical services by themselves. Patient
support systems are user friendly and give providers enough time to attend to patients with
reduced errors. Health awareness and education can be done by combining big data and the
convenience of mobile phones.
Exercise 3 Big Data Platform
Big data can be acquired through Google searches, passport scans, super market barcode
readings, cctv footages, voice messages and through social media. It can be used in machine
analysis in companies can to reveal patterns in human work to improve productivity.
Big data is organized using analysis done on the captured data.
Big data analysis is done using algorithms, big data modeling and running of queries
Exercise 4 Big Data Products
a) Google’s page rank is an algorithm that evaluates website links quality to determine its
importance. Big data is used in page ranking due to the number of clicks and posts users give for
a website.
b) Google spell checker is an application used to check grammar and spellings in typed content
such as documents. Big data collected from different dictionaries is used to check for spelling
errors.
healthcare use genomics to transform translational research. Healthcare could be revolutionized
by the ability to perform real-time analysis on data in motion. Converting unstructured data to
structured data makes the information machine manageable.
With the help of mobile phones, patients can access medical services by themselves. Patient
support systems are user friendly and give providers enough time to attend to patients with
reduced errors. Health awareness and education can be done by combining big data and the
convenience of mobile phones.
Exercise 3 Big Data Platform
Big data can be acquired through Google searches, passport scans, super market barcode
readings, cctv footages, voice messages and through social media. It can be used in machine
analysis in companies can to reveal patterns in human work to improve productivity.
Big data is organized using analysis done on the captured data.
Big data analysis is done using algorithms, big data modeling and running of queries
Exercise 4 Big Data Products
a) Google’s page rank is an algorithm that evaluates website links quality to determine its
importance. Big data is used in page ranking due to the number of clicks and posts users give for
a website.
b) Google spell checker is an application used to check grammar and spellings in typed content
such as documents. Big data collected from different dictionaries is used to check for spelling
errors.
c) Google’s flu trends monitor global flu health. Big data collected from different population can
be analyzed to give results whether a certain population has flu infection or not.
d) Google’s trends estimates how often a certain term is searched for a given time frame. This is
done by searching big data on search queries for analysis (security and exchange commission,
2009).
Users from these platforms post photos, posts, like comments and share the posts. The big data
collected can be used for marketing according to a users’ analyzed interest.
Exercise 5
RDBS also has scalability issues. Databases also involve slow importation of data before they
qre queried therefore RDBS cannot handle streaming data. (Madden, 2012).
According to Han, Haihong, Le, & Du, (2011), NoSQL database is a type of non-relational
database designed to handle big data due to its scalability feature. Type;
i. Key value stores- Pair a unique key with associated value for implementation of a model.
ii. The document database that use document format for storage of semi-structured data.
iii. Wide column store that use columns to organize data tables.
iv. Graph data stores that use nodes for data organization.
Map Reduce is a paradigm that allows servers to scale up. Its products include scalability of
businesses; cost effectiveness, flexibility, fast and parallel execution of data (Madden, 2012).
Features of Amazon’s S3 service
i. Websites, CloudFront
ii. Policies and Access Control Lists
iii. Buckets for back up and restoration
iv. Logging for buckets
v. Life cycle and visioning
be analyzed to give results whether a certain population has flu infection or not.
d) Google’s trends estimates how often a certain term is searched for a given time frame. This is
done by searching big data on search queries for analysis (security and exchange commission,
2009).
Users from these platforms post photos, posts, like comments and share the posts. The big data
collected can be used for marketing according to a users’ analyzed interest.
Exercise 5
RDBS also has scalability issues. Databases also involve slow importation of data before they
qre queried therefore RDBS cannot handle streaming data. (Madden, 2012).
According to Han, Haihong, Le, & Du, (2011), NoSQL database is a type of non-relational
database designed to handle big data due to its scalability feature. Type;
i. Key value stores- Pair a unique key with associated value for implementation of a model.
ii. The document database that use document format for storage of semi-structured data.
iii. Wide column store that use columns to organize data tables.
iv. Graph data stores that use nodes for data organization.
Map Reduce is a paradigm that allows servers to scale up. Its products include scalability of
businesses; cost effectiveness, flexibility, fast and parallel execution of data (Madden, 2012).
Features of Amazon’s S3 service
i. Websites, CloudFront
ii. Policies and Access Control Lists
iii. Buckets for back up and restoration
iv. Logging for buckets
v. Life cycle and visioning
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
NoSQL is a big data analysis tool for structured and unstructured data. Apache Hadoop allows
analysis through distributed file system. For visualization, datawrapper and solver are examples
used.
Exercise 6
The health industry should use big data. Big data could help the industry by helping create
awareness, increase medical research and allow conversion from unstructured data to structured
data (Feldman, Martin, & Skotnes, 2012). Social media produces a large amount of data every
day. Big data analyzed from these media could be used for decision making by the marketing
industry. (Micro Focus Blog, 2017).
Just like hospitals, the finance industry also produces bulk data (Feldman, Martin, & Skotnes,
2012). The data collected from banking systems could be used to analyze bankers’ activities.
.
analysis through distributed file system. For visualization, datawrapper and solver are examples
used.
Exercise 6
The health industry should use big data. Big data could help the industry by helping create
awareness, increase medical research and allow conversion from unstructured data to structured
data (Feldman, Martin, & Skotnes, 2012). Social media produces a large amount of data every
day. Big data analyzed from these media could be used for decision making by the marketing
industry. (Micro Focus Blog, 2017).
Just like hospitals, the finance industry also produces bulk data (Feldman, Martin, & Skotnes,
2012). The data collected from banking systems could be used to analyze bankers’ activities.
.
2613630_1350070549_WorkBookforWeek72
Memory virtualization is the use of virtual addresses by an operating system to compensate the
physical memory temporarily.
RAID 0 is a type of RAID that does not have parity.
Striping is the process of creating data portions into blocks and storing the blocks in different
storage devices in an array of independent disks. Disk striping with parity allows for redundancy.
Disk mirroring involves duplicating data into different hard drives.
Exercise 2 (Storage Design)
Storage repository impacts design of snapshots, granularity and performance of disk usage.
Virtual machines in the repository should not be more than five. Each virtual machine, virtual
disk is handled in its own repository to allow easy snapshot.
ISS is an information system that aids businesses in information processing and decision making.
Its components include hardware, software programs; database for data storage; communication
media and device operators.
Caching happens when data is stored locally for rapid retrieval. The browser keeps the data
copies in the user’s device for the next retrieval without having to refer back to the server.
A SAN is a network that interconnects storage devices located in different servers. NAS allows
heterogeneous users to access data in a centralized disk.
Memory virtualization is the use of virtual addresses by an operating system to compensate the
physical memory temporarily.
RAID 0 is a type of RAID that does not have parity.
Striping is the process of creating data portions into blocks and storing the blocks in different
storage devices in an array of independent disks. Disk striping with parity allows for redundancy.
Disk mirroring involves duplicating data into different hard drives.
Exercise 2 (Storage Design)
Storage repository impacts design of snapshots, granularity and performance of disk usage.
Virtual machines in the repository should not be more than five. Each virtual machine, virtual
disk is handled in its own repository to allow easy snapshot.
ISS is an information system that aids businesses in information processing and decision making.
Its components include hardware, software programs; database for data storage; communication
media and device operators.
Caching happens when data is stored locally for rapid retrieval. The browser keeps the data
copies in the user’s device for the next retrieval without having to refer back to the server.
A SAN is a network that interconnects storage devices located in different servers. NAS allows
heterogeneous users to access data in a centralized disk.
SAN is advantageous over NAS because it is highly scalable and is fast.
Network File System (NFS), Common Internet File System (CIFS) are the protocols used in
NAS. NFS is used in the UNIX environment while CIFS is used in Microsoft environment.
PART B
Exercise 3 Storage Design
IOPS = 1/Ts
Ts= (5/1000) + (0.5/ (15000/60)) + (4/40)
=0.005+ 0.002+ 0.1
=0.107
S = 1/Ts
=1/0.107
=9.346
Dp = 4900/9.346
= 524 disks required to meet performance
Dc = 1/ (100/1000)
Network File System (NFS), Common Internet File System (CIFS) are the protocols used in
NAS. NFS is used in the UNIX environment while CIFS is used in Microsoft environment.
PART B
Exercise 3 Storage Design
IOPS = 1/Ts
Ts= (5/1000) + (0.5/ (15000/60)) + (4/40)
=0.005+ 0.002+ 0.1
=0.107
S = 1/Ts
=1/0.107
=9.346
Dp = 4900/9.346
= 524 disks required to meet performance
Dc = 1/ (100/1000)
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
=10 disks required to meet capacity need
Disks required to meet application need = max (524, 10)
= 524 disks
Exercise 4 Storage evaluation
FCoE allows communication over the Ethernet. It reduces switch complexity and consolidates
input/ output.
FCoE deploys convergence of storage and devices and server for transmission hence cuts on
cabling capital and operational costs.
A virtual SAN (VSAN) is a partition in SAN that exists logically.
IP SAN protocols allow access of storage devices by servers. FCIP tunnels data between SAN
facilities information transmission.
Introduction to Object-based and Unified Storage
a) Highly scalable with minimal impact on performance
b) Stores unique IDs generated for objects
c) Binary representation of data
c) I/O traverse storage controller to disk
a) Provides block, file, and object-based access within one platform
Disks required to meet application need = max (524, 10)
= 524 disks
Exercise 4 Storage evaluation
FCoE allows communication over the Ethernet. It reduces switch complexity and consolidates
input/ output.
FCoE deploys convergence of storage and devices and server for transmission hence cuts on
cabling capital and operational costs.
A virtual SAN (VSAN) is a partition in SAN that exists logically.
IP SAN protocols allow access of storage devices by servers. FCIP tunnels data between SAN
facilities information transmission.
Introduction to Object-based and Unified Storage
a) Highly scalable with minimal impact on performance
b) Stores unique IDs generated for objects
c) Binary representation of data
c) I/O traverse storage controller to disk
a) Provides block, file, and object-based access within one platform
2613631_41474752_WorkBookforWeek82
Exercise 1 Green computing
Greenhouse effect refers to a process in which the earth’s surface becomes warm.
Greening IT products maximizes efficiency of a product and reduces greenhouse gas
emissions from IT products.
Green IT is the practice of ensuring environmental sustainability using computing. It
is important in minimizing harmful IT products and helps promote biodegradability
of the products for environmental friendliness (Murugesan, 2008).
Exercise 2 Environmental sustainability
A greener environment is built by adopting appropriate approaches by making the IT
life cycle greener through environmental sustainability.
IT poses both positive and negative impacts to the environment. Manufacture of
computer components is hazardous to the environment. It is necessary for IT experts
to deploy green IT for environmental sustainability. Education on a green IT
prospects should be done.
Exercise 3 Environmentally sound practices.
PUE is a measurement metric for determining a data centre’s energy efficiency.
PUE’s reciprocal is data centre infrastructure efficiency (DCIE).
DCE is used for measuring a data centre’s power efficiency.
Data centre infrastructure efficiency is used to find a data centre’s energy efficiency.
IT equipment power and total facility power are used to calculate DCIE
Universities offering green computing courses
i. University of Massachusetts- course name Green computing. Students are
introduced to the green computing age.
Exercise 1 Green computing
Greenhouse effect refers to a process in which the earth’s surface becomes warm.
Greening IT products maximizes efficiency of a product and reduces greenhouse gas
emissions from IT products.
Green IT is the practice of ensuring environmental sustainability using computing. It
is important in minimizing harmful IT products and helps promote biodegradability
of the products for environmental friendliness (Murugesan, 2008).
Exercise 2 Environmental sustainability
A greener environment is built by adopting appropriate approaches by making the IT
life cycle greener through environmental sustainability.
IT poses both positive and negative impacts to the environment. Manufacture of
computer components is hazardous to the environment. It is necessary for IT experts
to deploy green IT for environmental sustainability. Education on a green IT
prospects should be done.
Exercise 3 Environmentally sound practices.
PUE is a measurement metric for determining a data centre’s energy efficiency.
PUE’s reciprocal is data centre infrastructure efficiency (DCIE).
DCE is used for measuring a data centre’s power efficiency.
Data centre infrastructure efficiency is used to find a data centre’s energy efficiency.
IT equipment power and total facility power are used to calculate DCIE
Universities offering green computing courses
i. University of Massachusetts- course name Green computing. Students are
introduced to the green computing age.
ii. American Public University System- name of the course is Green computing:
advanced topics. The course covers ways of making greener and more
efficient computing.
iii. Carnegie Mellon University- course name Green computing. It is divided into
two for introduction of green computing.
iv. Karlstad University- course name green computing. It covers IT
environmental perspectives, green IT and standards that relate to IT products.
v. University of Hertfordshire- course name green computing. Implements
carbon reduction strategy and a green IT strategy.
Exercise 4 Major Cloud APIs
Amazon has Elastic Cloud Compute (EC2) for launching server instances in amazon
data centres. Microsoft offers graph APIs, Billing APIs, speech APIs
GoGrid APIs allow clients to control their GoGrid environment by programming.
Google APIs are used for monitoring of Google cloud usage and gives users access
to Google applications such as Google maps.
Part B
Exercise 1
The restriction of hazardous substances directive states that products must have a
declaration of conformity, have a compliance file by the manufacture or supplier and
marked as appropriate. EPEAT products have less environmental impacts throughout
their life cycle. In energy star 4.0 standards, the power management technology in
computers is used.
Exercise 2 Green cloud computing
Computation processing, storage, cooling systems and networks are the major power
consumers. Sophisticated circuit architectures have been introduced to enable
adjustment of microprocessor frequency. In computation processing and
advanced topics. The course covers ways of making greener and more
efficient computing.
iii. Carnegie Mellon University- course name Green computing. It is divided into
two for introduction of green computing.
iv. Karlstad University- course name green computing. It covers IT
environmental perspectives, green IT and standards that relate to IT products.
v. University of Hertfordshire- course name green computing. Implements
carbon reduction strategy and a green IT strategy.
Exercise 4 Major Cloud APIs
Amazon has Elastic Cloud Compute (EC2) for launching server instances in amazon
data centres. Microsoft offers graph APIs, Billing APIs, speech APIs
GoGrid APIs allow clients to control their GoGrid environment by programming.
Google APIs are used for monitoring of Google cloud usage and gives users access
to Google applications such as Google maps.
Part B
Exercise 1
The restriction of hazardous substances directive states that products must have a
declaration of conformity, have a compliance file by the manufacture or supplier and
marked as appropriate. EPEAT products have less environmental impacts throughout
their life cycle. In energy star 4.0 standards, the power management technology in
computers is used.
Exercise 2 Green cloud computing
Computation processing, storage, cooling systems and networks are the major power
consumers. Sophisticated circuit architectures have been introduced to enable
adjustment of microprocessor frequency. In computation processing and
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
virtualization, outsourcing from the cloud has been done to reduce resource
utilization. In cooling systems, designing of micro-channel heat sinks that reduce
thermal resistance between transistor and fluids could reduce overheating of micro-
processors. Network equipment manufacturers have developed technologies that are
energy saving. Power management policies have been formed. Smaller hard disk
drives are being used.
Exercise3: Cloud API Functionalities
sainsburys-nectar-api functionalities
• Retrieval of account details including name, currency and point balance
• Retrieval of available offers that contains offer details like validity period
• Acquiring offers
• Dependencies
OpenStack Compute API. It gives a python API and a command line script.
OpenStack API features include unlimited storage therefore is highly scalable, object
structure; it has in built 3x data redundancy that increases availability; RAID is not
required hence it handles both read and write efficiently; It offers real time visibility
to requests of clients, elasticity; it enables direct accessibility to content by browsers;
it performs drive auditing; has account management utilities, container management
and monitoring OpenStack APIs are used to manage openStack Cloud. The activities
that could be carried out include launching server instances, creating images,
assigning of metadata and creation of storage containers.
utilization. In cooling systems, designing of micro-channel heat sinks that reduce
thermal resistance between transistor and fluids could reduce overheating of micro-
processors. Network equipment manufacturers have developed technologies that are
energy saving. Power management policies have been formed. Smaller hard disk
drives are being used.
Exercise3: Cloud API Functionalities
sainsburys-nectar-api functionalities
• Retrieval of account details including name, currency and point balance
• Retrieval of available offers that contains offer details like validity period
• Acquiring offers
• Dependencies
OpenStack Compute API. It gives a python API and a command line script.
OpenStack API features include unlimited storage therefore is highly scalable, object
structure; it has in built 3x data redundancy that increases availability; RAID is not
required hence it handles both read and write efficiently; It offers real time visibility
to requests of clients, elasticity; it enables direct accessibility to content by browsers;
it performs drive auditing; has account management utilities, container management
and monitoring OpenStack APIs are used to manage openStack Cloud. The activities
that could be carried out include launching server instances, creating images,
assigning of metadata and creation of storage containers.
References
Berkeley. (2018). What is data science? Retrieved from
https://datascience.berkeley.edu/about/what-is-data-science/
Feldman, B., Martin, E., & Skotnes, T. (2012). Big Data in healthcare: Hype and Hope.
Micro Focus Blog. (2017, October 10). How Much Data is Created on the Internet Each Day?
Retrieved from https://blog.microfocus.com/how-much-data-is-created-on-the-internet-
each-day/
Madden, S. (2012). From databases to big data. IEEE Internet Computing, (3), 4-6.
Security and exchange commission. (2009). annual report pursuant to section 13 or 15(d) of the
securities exchange act of 1934. Retrieved from Google Inc website:
https://www.sec.gov/Archives/edgar/data/1288776/000119312510030774/d10k.htm
Manovich, L. (2011). Trending: The promises and the challenges of big social data. Debates in
the digital humanities, 2, 460-475.
Murugesan, S. (2008). Harnessing green IT: Principles and practices. IT professional, 10(1).Han,
J., Haihong, E., Le, G., & Du, J. (2011, October). Survey on NoSQL database. In Pervasive
computing and applications (ICPCA), 2011 6th international conference on (pp. 363-366).
IEEE.
Berkeley. (2018). What is data science? Retrieved from
https://datascience.berkeley.edu/about/what-is-data-science/
Feldman, B., Martin, E., & Skotnes, T. (2012). Big Data in healthcare: Hype and Hope.
Micro Focus Blog. (2017, October 10). How Much Data is Created on the Internet Each Day?
Retrieved from https://blog.microfocus.com/how-much-data-is-created-on-the-internet-
each-day/
Madden, S. (2012). From databases to big data. IEEE Internet Computing, (3), 4-6.
Security and exchange commission. (2009). annual report pursuant to section 13 or 15(d) of the
securities exchange act of 1934. Retrieved from Google Inc website:
https://www.sec.gov/Archives/edgar/data/1288776/000119312510030774/d10k.htm
Manovich, L. (2011). Trending: The promises and the challenges of big social data. Debates in
the digital humanities, 2, 460-475.
Murugesan, S. (2008). Harnessing green IT: Principles and practices. IT professional, 10(1).Han,
J., Haihong, E., Le, G., & Du, J. (2011, October). Survey on NoSQL database. In Pervasive
computing and applications (ICPCA), 2011 6th international conference on (pp. 363-366).
IEEE.
1 out of 12
Related Documents
Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
© 2024 | Zucol Services PVT LTD | All rights reserved.