Green Computing, Big Data, and Storage Design: A Workbook for Week 6-8
Verified
Added on  2023/06/04
|12
|2466
|122
AI Summary
This workbook covers topics such as green computing, big data, and storage design. It includes exercises on data science, the 7Vs of big data, NoSQL databases, and cloud APIs. The workbook also provides information on universities offering green computing courses and major cloud APIs.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
Title Author Professor University City, State Date
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
2613629_1516318593_WorkBookforWeek61 Exercise 1 Data science refers to the collection of data, processing it in order to extract value from it and visualizing the processed data (Berkeley, 2018). According to IBM, an estimate of 2.5 quintillion of data is generated daily. Since 2016, about 90% of the world’s data has been created on the internet. The number of internet users grew from 2.4 billion in 2014 to 3.8 billion in 2017. (Micro Focus Blog, 2017). The media today requires large storage for editing, production and archiving due to the increasing resolution of images being taken. A petabyte equals to 1,024 terabytes. As volumes of data increases, the need for storage also increases thus the need for petabyte storage (Manovich, 2011). The course offers seven foundation, seven advanced courses and a synthetic capstone course. Students with knowledge in object oriented programming take less unit courses compared with students without knowledge in object oriented programming. Exercise 2 The author wants to clarify what big data is and the need to explain the 7thV, value, and its importance to the health industry. The author also explains the challenges and issues raised by value to an industry. The 7Vs Big data can be defined using the 7Vs velocity, volume, veracity, visualization, variety, value and variability. Volume refers to how much data is contained. Velocity refers to the speed at which the data can be accessed. Variety describes the different forms in which data could exist. Visualization comprises of the use of charts to aid visualize the data. Veracity ensures data accuracy. Variability refers to the inconsistency of data. Value refers to the importance of data. Impact on healthcare
Big data is used to support research. Companies such as Genome Health Solutions and GNS healthcare use genomics to transform translational research. Healthcare could be revolutionized by the ability to perform real-time analysis on data in motion. Converting unstructured data to structured data makes the information machine manageable. With the help of mobile phones, patients can access medical services by themselves. Patient support systems are user friendly and give providers enough time to attend to patients with reduced errors. Health awareness and education can be done by combining big data and the convenience of mobile phones. Exercise 3 Big Data Platform Big data can be acquired through Google searches, passport scans, super market barcode readings, cctv footages, voice messages and through social media. It can be used in machine analysis in companies can to reveal patterns in human work to improve productivity. Big data is organized using analysis done on the captured data. Big data analysis is done using algorithms, big data modeling and running of queries Exercise 4 Big Data Products a) Google’s page rank is an algorithm that evaluates website links quality to determine its importance. Big data is used in page ranking due to the number of clicks and posts users give for a website. b) Google spell checker is an application used to check grammar and spellings in typed content such as documents. Big data collected from different dictionaries is used to check for spelling errors.
c) Google’s flu trends monitor global flu health. Big data collected from different population can be analyzed to give results whether a certain population has flu infection or not. d) Google’s trends estimates how often a certain term is searched for a given time frame. This is done by searching big data on search queries for analysis (security and exchange commission, 2009). Users from these platforms post photos, posts, like comments and share the posts. The big data collected can be used for marketing according to a users’ analyzed interest. Exercise 5 RDBS also has scalability issues. Databases also involve slow importation of data before they qre queried therefore RDBS cannot handle streaming data. (Madden, 2012). According toHan, Haihong, Le, & Du, (2011),NoSQL database is a type of non-relational database designed to handle big data due to its scalability feature. Type; i.Key value stores- Pair a unique key with associated value for implementation of a model. ii.The document database that use document format for storage of semi-structured data. iii.Wide column store that use columns to organize data tables. iv.Graph data stores that use nodes for data organization. Map Reduce is a paradigm that allows servers to scale up. Its products include scalability of businesses; cost effectiveness, flexibility, fast and parallel execution of data (Madden, 2012). Features of Amazon’s S3 service i.Websites, CloudFront ii.Policies and Access Control Lists iii.Buckets for back up and restoration iv.Logging for buckets v.Life cycle and visioning
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
NoSQL is a big data analysis tool for structured and unstructured data. Apache Hadoop allows analysis through distributed file system. For visualization, datawrapper and solver are examples used. Exercise 6 The health industry should use big data. Big data could help the industry by helping create awareness, increase medical research and allow conversion from unstructured data to structured data (Feldman, Martin, & Skotnes, 2012).Social media produces a large amount of data every day. Big data analyzed from these media could be used for decision making by the marketing industry. (Micro Focus Blog, 2017). Just like hospitals, the finance industry also produces bulk data (Feldman, Martin, & Skotnes, 2012). The data collected from banking systems could be used to analyze bankers’ activities. .
2613630_1350070549_WorkBookforWeek72 Memory virtualization is the use of virtual addresses by an operating system to compensate the physical memory temporarily. RAID 0 is a type of RAID that does not have parity. Striping is the process of creating data portions into blocks and storing the blocks in different storage devices in an array of independent disks. Disk striping with parity allows for redundancy. Disk mirroring involves duplicating data into different hard drives. Exercise 2 (Storage Design) Storage repository impacts design of snapshots, granularity and performance of disk usage. Virtual machines in the repository should not be more than five. Each virtual machine, virtual disk is handled in its own repository to allow easy snapshot. ISS is an information system that aids businesses in information processing and decision making. Its components include hardware, software programs; database for data storage; communication media and device operators. Caching happens when data is stored locally for rapid retrieval. The browser keeps the data copies in the user’s device for the next retrieval without having to refer back to the server. A SAN is a network that interconnects storage devices located in different servers. NAS allows heterogeneous users to access data in a centralized disk.
SAN is advantageous over NAS because it is highly scalable and is fast. Network File System (NFS), Common Internet File System (CIFS) are the protocols used in NAS. NFS is used in the UNIX environment while CIFS is used in Microsoft environment. PART B Exercise 3 Storage Design IOPS = 1/Ts Ts= (5/1000) + (0.5/ (15000/60)) + (4/40) =0.005+ 0.002+ 0.1 =0.107 S = 1/Ts =1/0.107 =9.346 Dp= 4900/9.346 = 524 disks required to meet performance Dc= 1/ (100/1000)
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
=10 disks required to meet capacity need Disks required to meet application need = max (524, 10) = 524 disks Exercise 4 Storage evaluation FCoEallows communication over the Ethernet. It reduces switch complexity and consolidates input/ output. FCoE deploys convergence of storage and devices and server for transmission hence cuts on cabling capital and operational costs. A virtual SAN (VSAN) is a partition in SAN that exists logically. IP SAN protocols allow access of storage devices by servers. FCIP tunnels data between SAN facilities information transmission. Introduction to Object-based and Unified Storage a) Highly scalable with minimal impact on performance b) Stores unique IDs generated for objects c) Binary representation of data c) I/O traverse storage controller to disk a) Provides block, file, and object-based access within one platform
2613631_41474752_WorkBookforWeek82 Exercise 1 Green computing Greenhouse effect refers to a process in which the earth’s surface becomes warm. Greening IT products maximizes efficiency of a product and reduces greenhouse gas emissions from IT products. Green IT is the practice of ensuring environmental sustainability using computing. It is important in minimizing harmful IT products and helps promote biodegradability of the products for environmental friendliness (Murugesan, 2008). Exercise 2 Environmental sustainability A greener environment is built by adopting appropriate approaches by making the IT life cycle greener through environmental sustainability. IT poses both positive and negative impacts to the environment. Manufacture of computer components is hazardous to the environment. It is necessary for IT experts to deploy green IT for environmental sustainability. Education on a green IT prospects should be done. Exercise 3 Environmentally sound practices. PUE is a measurement metric for determining a data centre’s energy efficiency. PUE’s reciprocal is data centre infrastructure efficiency (DCIE). DCE is used for measuring a data centre’s power efficiency. Data centre infrastructure efficiency is used to find a data centre’s energy efficiency. IT equipment power and total facility power are used to calculate DCIE Universities offering green computing courses i.University of Massachusetts- course name Green computing. Students are introduced to the green computing age.
ii.American Public University System- name of the course is Green computing: advanced topics. The course covers ways of making greener and more efficient computing. iii.Carnegie Mellon University- course name Green computing. It is divided into two for introduction of green computing. iv.KarlstadUniversity-coursenamegreencomputing.ItcoversIT environmental perspectives, green IT and standards that relate to IT products. v.UniversityofHertfordshire-coursenamegreencomputing.Implements carbon reduction strategy and a green IT strategy. Exercise 4 Major Cloud APIs Amazon has Elastic Cloud Compute (EC2) for launching server instances in amazon data centres. Microsoft offers graph APIs, Billing APIs, speech APIs GoGrid APIs allow clients to control their GoGrid environment by programming. Google APIs are used for monitoring of Google cloud usage and gives users access to Google applications such as Google maps. Part B Exercise 1 The restriction of hazardous substances directive states that products must have a declaration of conformity, have a compliance file by the manufacture or supplier and marked as appropriate. EPEAT products have less environmental impacts throughout their life cycle. In energy star 4.0 standards, the power management technology in computers is used. Exercise 2 Green cloud computing Computation processing, storage, cooling systems and networks are the major power consumers.Sophisticatedcircuitarchitectureshavebeenintroducedtoenable adjustmentofmicroprocessorfrequency.Incomputationprocessingand
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
virtualization,outsourcingfromthecloudhasbeendonetoreduceresource utilization. In cooling systems, designing of micro-channel heat sinks that reduce thermal resistance between transistor and fluids could reduce overheating of micro- processors. Network equipment manufacturers have developed technologies that are energy saving. Power management policies have been formed. Smaller hard disk drives are being used. Exercise3: Cloud API Functionalities sainsburys-nectar-api functionalities •Retrieval of account details including name, currency and point balance •Retrieval of available offers that contains offer details like validity period •Acquiring offers •Dependencies OpenStack Compute API. It gives a python API and a command line script. OpenStack API features include unlimited storage therefore is highly scalable, object structure; it has in built 3x data redundancy that increases availability; RAID is not required hence it handles both read and write efficiently; It offers real time visibility to requests of clients, elasticity; it enables direct accessibility to content by browsers; it performs drive auditing; has account management utilities, container management and monitoring OpenStack APIs are used to manage openStack Cloud. The activities thatcould becarriedout includelaunchingserver instances,creatingimages, assigning of metadata and creation of storage containers.
References Berkeley. (2018). What is data science? Retrieved from https://datascience.berkeley.edu/about/what-is-data-science/ Feldman,B., Martin,E., & Skotnes,T. (2012).Big Data in healthcare: Hype and Hope. Micro Focus Blog. (2017, October 10). How Much Data is Created on the Internet Each Day? Retrieved from https://blog.microfocus.com/how-much-data-is-created-on-the-internet- each-day/ Madden, S. (2012). From databases to big data.IEEE Internet Computing, (3), 4-6. Security and exchange commission. (2009).annual report pursuant to section 13 or 15(d) of the securities exchange act of 1934. Retrieved from Google Inc website: https://www.sec.gov/Archives/edgar/data/1288776/000119312510030774/d10k.htm Manovich, L. (2011). Trending: The promises and the challenges of big social data.Debates in the digital humanities,2, 460-475. Murugesan, S. (2008). Harnessing green IT: Principles and practices.IT professional,10(1).Han, J., Haihong, E., Le, G., & Du, J. (2011, October). Survey on NoSQL database. InPervasive computing and applications (ICPCA), 2011 6th international conference on(pp. 363-366). IEEE.