Ask a question from expert

Ask now

ITECH 2201 Cloud Computing School of Science

25 Pages6295 Words137 Views
   

Added on  2021-05-30

ITECH 2201 Cloud Computing School of Science

   Added on 2021-05-30

BookmarkShareRelated Documents
ITECH 2201 Cloud ComputingSchool of Science, Information Technology & EngineeringWorkbook for Week 6 (Big Data)Please note: All the efforts were taken to ensure the given web links are accessible. However,if they are broken – please use any appropriate video/article and refer them in your answerPart A(4 Marks)Exercise 1: Data Science(1 mark)Read the article at http://datascience.berkeley.edu/about/what-is-data-science/ and answer the following:What is Data Science?Data science: Multidisciplinary integration of the data reasoning, algorithm development as well as technology to resolve difficult problems of analysis (Kowolenko & Vouk, 2018). According to IBM estimation, what is the percent of the data in the world today that has beencreated in the past two years?90% of data in the whole world created in last two yearsWhat is the value of petabytestorage?Petabyte= 1015bytesof dataFor each course, both foundation and advanced, you find at http://datascience.berkeley.edu/academics/curriculum/briefly state (in 2 to 3 lines) what they offer?Based on the given course description as well as from the video. The purpose of this question is to understand the different streams available in Data Science. CRICOS Provider No. 00103DInsert file name herePage 1 of 25
ITECH 2201 Cloud Computing School of Science_1
Foundation: The Master of Information and Data Science curriculum design options can accelerate the completion of the course. Students who are unfamiliar with object-oriented based programming will need to complete the Python for the Data Science Foundation classes or course as a main part of their basic course units. Advanced: Data science roles and responsibilities are different as well as skills needed for them differ considerably.The advanced course plays an important role in profound or deep understanding and value as well as application of the data science.Exercise 2: Characteristics of Big Data(2 marks)Read the following research paper from IEEE Xplore Digital LibraryAli-ud-din Khan, M.; Uddin, M.F.; Gupta, N., "Seven V's of Big Data understanding Big Data to extract value," American Society for Engineering Education (ASEE Zone 1), 2014 Zone 1 Conference of the , pp.1,5, 3-5 April 2014and answer the following questions:Summarise the motivation of the author (in one paragraph) In this article motivation for writing this article as well as outlining pertinent arguments comes from a fact that Big data has become part of everybody life, as well as big data hide solutions to several problems in any of these industries. In fact, Big Data provides the raw material for building the next great machine. The author supports the fact in which big data will ultimately take over new technology and the Internet world. What are the 7 v’s mentioned in the paper? Briefly describe each V in one paragraph. 1.Volume: This is the main feature that defines big data as "big."2. Velocity: This can simply be defined as the speed of change.3.Variety: It can be simply defined as having different forms of data sources.CRICOS Provider No. 00103DInsert file name herePage 2 of 25
ITECH 2201 Cloud Computing School of Science_2
4.Veracity: It refers to the credibility of the data being used.. 5.Variability: First, the variability is different from the variety.6.Visualization: It refers to how data is presented to management for decision making.7.Value: Values are the last, and it is important to understand that organizations need to obtain somevalue after tremendous efforts and resources.Explore the author’s future work by using the reference [4] in the research paper.Summarise your understanding how Big Data can improve the healthcare sector in 300words. The healthcare industry faces many challenges, in which new diseases are spread to maintain good efficiency.Large data analysis can help to resolve these health-care challenges. Due to the enormous data available inthe healthcare sector such as health, clean, financial, medical, research and development, administration andoperational data, meaningful insights can be found to improve the functioning of the industry. Healthcarecompanies have used large numbers to check the hospital's admission rates and analyze staff performance asa part of their professional intelligence plan. Healthcare firms can cut out the healthcare costs and providegood care with predictive analysis. Big data assists to reduce the risk of medicines by improving administrativeperformance and financial and helps to reduce reading. Medical insurance is complex and suffers fromcontroversial and fraudulent claims. Big Data Analysis helps facilitate the efficiency of medical insuranceclaims by disclosing the trends and exposing the claims process. Patients get good returns on their insuranceclaims, and keepers receive fast money, relying on EMR's adoption is filling data in healthcare, and becausecarers need to keep records of historical patients, the number of data will only increase, this is good news forsolution providers selling data storage and large data It is only for analysis That's good news. The trend isseen in EMR, diagnosis, treatment effectiveness, operational effectiveness, vendor expenses and manythings. There is everything to look for the right use case for data. Exercise 3: Big Data Platform(1 mark)In order to build a big data platform - one has to acquire, organize and analyse the big data. Go through the following links and answer the questions that follow the links: Check the videos and change the wordingshttp://www.infochimps.com/infochimps-cloud/how-it-works/CRICOS Provider No. 00103DInsert file name herePage 3 of 25
ITECH 2201 Cloud Computing School of Science_3
http://www.youtube.com/watch?v=TfuhuA_uahohttp://www.youtube.com/watch?v=IC6jVRO2Hq4http://www.youtube.com/watch?v=2yf_jrBhz5wPlease note: You are encouraged to watch all the videos in the series from Oracle.How to acquire big data for enterprises and how it can be used?In the above video, Big Data now shows a big problem, but there is a simple story behind Hype. For decades,the company has been making professional decisions based on stored transaction data in relationaldatabases. Oracle offers a comprehensive and highly integrated product portfolio so that you can create andmanage these different types of data, explore new insights and analyze those using existing data to gainhidden relationships (AlMahmoud, Damiani, Otrok & Al-Hammadi, 2017). How to organize and handle the big data? To help research institutions collect, integrate, organize, and analyze data from a variety of sources, acomprehensive and comprehensive product portfolio is necessary. Before filtering large data researchplatforms in data warehouses, users need to process, filter, and transform large amounts of data. What are the analyses that can be done using big data?Infrastructure involved in the analysis of large amounts of data needs to support analytics for examplestatistical analysis and data mining in order to store large amounts of the data types in various systems; extentto intense data volumes; provide fast response times; as well as automate based on an logical modelsdecision making.Part B(4 Marks)Part B answers should be based on well cited article/videos – name the references used in your answer.For more information read the guidelines as given in Assignment 1. Exercise 4: Big Data Products (1 mark)CRICOS Provider No. 00103DInsert file name herePage 4 of 25
ITECH 2201 Cloud Computing School of Science_4
Google is a master at creating data products. Below are few examples from Google. Describe the below products and explain how the large scale data is used effectively in these products. a.Google’s PageRankPageRank is a measurement that evaluates the number and quality of webpage, determines theimportance of the page and related authentic scores in 0-10 scale. b.Google’s Spell Checker Google's spell check is a very old feature that Google constantly improves. It is necessary tounderstand how this works to understand keyword research process and to better understandreputation / brand management. c.Google’s Flu TrendsWeb service operated by the Google.d.Google’s Trends Google Trends are publically web based facility for the Google Inc. that is completely based on the Google Search and it also shows that certain search-periods are often used in various parts of the world with search-volume and in different languages.CRICOS Provider No. 00103DInsert file name herePage 5 of 25
ITECH 2201 Cloud Computing School of Science_5
Like Google – Facebook and LinkedIn also uses large scale data effectively. How?As a social platform, websites generate mass data. All data is grouped together to analyse behaviours of usersin order to provide recommendations through social media providers. For example, Facebook will notify things about neglect, so that users can love their posts and their search terms according to their pages or products they like. Exercise 5: Big Data Tools(2 marks)Briefly explain why a traditional relational database (RDBS) is not effectively used to storebig data? First of all, size of data has been greatly increased to range of PB level, a PB level = 1,024 TB. RDBMS finds itas very challenging to manage or handle this huge large data. To solve this problem, RDBMS adds morecentralized processing units and more memory generation to the database management systems. Secondly,most data comes from semi-structured or non-formatted formats of social media, video, text, email and audio(Zhao, Zhou, Li & Huang, 2018).What is NoSQL Database?It offers transaction manipulation, horizontal scalability, and transactional semantics for easy management andinspection. NoSQL is a database design methodology that can adapt to various data models including various values, documents, histograms and graphic formats. Name and briefly describe at least 5 NoSQL Databases Cassandra originally developed by Facebook and now became Apache open source software, which is wellsuited for social networking of the cloud computing databases.Lucene: Subproject of Apache Software. It is an open source full-text search engine toolkit.Oracle NoSQL: Oracle NoSQL Database is Oracle's NoSQL type distributed key database.H Base: An open source, non-relational distributed database CRICOS Provider No. 00103DInsert file name herePage 6 of 25
ITECH 2201 Cloud Computing School of Science_6

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Big Data and MapReduce
|23
|7772
|440

Big Data Characteristics, Tools and Applications - ITECH 2201 Cloud Computing
|8
|1913
|494

What is Data Science? Part B Exercise 1: What is Data Science?
|24
|7358
|489

Big Data Workbook for Week 6 - ITECH 2201 Cloud Computing
|10
|3514
|434

Workbook for Week 8 - Cloud Computing and Green IT
|7
|2060
|453

ITECH 2201 Cloud Computing Assignment (Big Data)
|6
|2377
|124