logo

What is Data Science? Part B Exercise 1: What is Data Science?

   

Added on  2021-05-31

24 Pages7358 Words489 Views
 | 
 | 
 | 
ITECH 2201 Cloud ComputingSchool of Science, Information Technology & EngineeringWorkbook for Week 6 (Big Data)Please note: All the efforts were taken to ensure the given web links are accessible.However, if they are broken – please use any appropriate video/article and refer them inyour answerCRICOS Provider No. 00103DInsert file name herePage 1 of 24
What is Data Science? Part B Exercise 1: What is Data Science?_1

Part AExercise 1: Data ScienceRead the article at http://datascience.berkeley.edu/about/what-is-data-science/ and answer the following:What is Data Science?Data science basically a topic that refers to the significance and organization of data avalanche creation in recent years. It allows the identification of patterns and patterns of data, and allows people with advanced scholarships to improve the conditions in which humanity creates social plus business value. The appearance of the "bid data" also enables us to comprehend these phenomena more deeply, ranging from biological systems and economic behavior 1 to human social entities.According to IBM estimation, what is the percent of the data in the world today that has been createdin the past two years?It is measured or estimated that ninety percent of the world's data in last two years has been completed byIBM. What is the value of petabytestorage?Million gigabytes also written as (10 to 15th power) is peta-byte. For each course, both foundation and advanced, you find at http://datascience.berkeley.edu/academics/curriculum/briefly state (in 2 to 3 lines) what they offer? Based on the given course description as well as from the video. The purpose of this question is to understand the different streams available in Data Science. Foundation course:Foundation course or basic curriculum is n essential skills and knowledge that students provide in thedata science. It includes storing, searching, designing, and analysing of research work in data scienceprovide students with data visualization and practical application knowledge (Khan, Fahim Uddin &Gupta, 2014). Advanced course: Advanced course plays an important role in deep understanding and value and application of the datascience. Analytical method comprises complex skills that address big data-related issues throughexperimental design and data visualization to help students explore and make them aware of the exactusage of data science.CRICOS Provider No. 00103DInsert file name herePage 2 of 24
What is Data Science? Part B Exercise 1: What is Data Science?_2

Exercise 2: Characteristics of Big DataRead the following research paper from IEEE Xplore Digital LibraryAli-ud-din Khan, M.; Uddin, M.F.; Gupta, N., "Seven V's of Big Data understanding Big Data to extract value," American Society for Engineering Education (ASEE Zone 1), 2014 Zone 1 Conferenceof the , pp.1,5, 3-5 April 2014 and answer the following questions: Summarise the motivation of the author (in one paragraph) As the author has described, it comes from the fact that BD is emphasized because it now become importantpart of life and also hides solutions to any industry problem. The main reason for this paper is that theythink big data is the main area of technology. In addition, it is written for "BD Ocean". As we all know,billions of statistics are generated every day, making big data as a style.What are the 7 v’s mentioned in the paper? Briefly describe each V in one paragraph. 1)Velocity: Velocity is discussed from two perspectives. Basic thing is incoming of data that enterprise needs to prepare the technology plus database engine processes. The other is to move big data to a large storage area that needs a quick response when the data arrives.2)Variety: It includes diverse shapes, such as video, text, which is a main difference between big data as well as traditional data. The challenging part is due to complexity that can lead to erroneous data integration.3)Volume: Volume means size of information or data created from some sources including audio, text, video, research reports, spatial images, social networks, weather forecasts, crime reports to mention.4)Veracity: Compared with traditional data, it focuses on the reliability of data because it can be standardized. These big data come directly from users. The reliability of these users is low. Therefore, cleaning up data is an important step for big data.5)Volatility: When considering big data, volatility means data retention strategy. This is easily executed in a relational database furthermore can expand the type, speed, and amount of data in the big data world.6)Value: Value is a significant V value because it is an ideal result of big data analysis and is also the result of previous analysis.7)Validity: Validity means accuracy of the data and correct usage and data is real and does not want to be effective in dissimilar situations (Corea, 2016). Explore the author’s future work by using the reference [4] in the research paper. Summarise your understanding how Big Data can improve the healthcare sector in 300 words. As it has been stated that the cost or ownership and management data will be exceeded. The governancemechanism depends to a large extent on the value of the data. For structures and strategies, it is required towrite and execute truth limit of project information extraction simultaneously. Data can be between layers,in short, there is less risk of data at higher levels. Therefore, it is recognized that there are higher storagecosts and higher levels of protection to ensure these levels are related to costs. Advent of digitalizedtechnology has provided many benefits for healthcare suppliers. One of the key advances is the utilizes ofCRICOS Provider No. 00103DInsert file name herePage 3 of 24
What is Data Science? Part B Exercise 1: What is Data Science?_3

big information in medical business. Utilizing big data may help medical industry participants providesmore effective operations moreover insight into patient as well as their well being. Healthcare businessfaces a variety of challenges, from a new disorder outbreak to maintain optimal operational efficiencies.The Big data analytic may also help solve these health care challenges. Utilizing a large amount ofinformation in healthcare industry, such as clinical, financial, development and research, operational data,and management, The Big Data may gain meaningful insight and improve operational effectiveness of thebusiness. “Healthcare companies can lower medical costs and provide better services Finding ways to treatdiseases: Some drugs seem to works for several peoples, however not other, furthermore there are thevarious things to observe in single genome. It is impossible to learn all of these learning’s in detail,however big data may help reveal unknown correlation, hidden pattern, and insight also by examine hugeamounts of information.In future, it can be used to create special drugs for the patient's human genome toobtain the best therapeutic effect. Combining all patients' electronic health records, dietary information,social factors, etc. with DNA sequencing can recommend customized treatment and personalized medicine.Aurora Health Care has begun a proof of concept for this, and they have been able to reduce thereadmission rate by 10% and save $6 million annually (Abouelmehdi, Beni-Hessane & Khaloufi, 2018).Exercise 3: Big Data PlatformIn order to build a big data platform - one has to acquire, organize and analyse the big data. Go through the following links and answer the questions that follow the links: Check the videos and change the wordingshttp://www.infochimps.com/infochimps-cloud/how-it-works/http://www.youtube.com/watch?v=TfuhuA_uahohttp://www.youtube.com/watch?v=IC6jVRO2Hq4http://www.youtube.com/watch?v=2yf_jrBhz5wPlease note: You are encouraged to watch all the videos in the series from Oracle.How to acquire big data for enterprises and how it can be used?From the video mentioned as well as Oracle's article the main change to infrastructure are the procurementphase. These 2 major use cases must be consider. First, for the social media update, forum comment andblogs, companies can simply remove analysis of overnight or weekly trends. Want to update, study, alsostore information for online profile moreover continue to monitor sensor. In case, the NoSQL database maybe use to store a big data but it is extensible and flexible. Even the Hadoop distributed system files may beuse for batch information. In this method, the system aims to capture all information by not parsing data andcategorizing it in fixed mode. As a result, data can be easily accessed through simple keys and customer-based applications.How to organize and handle the big data? Stored data in the HDFS want to be a pre-processed, well organized, and converted so that it may be loadedinto information warehouse using traditional enterprise data and data store in NoSQL. It moreover knowsthat BD is always in different formats. Procedure called sessions are for specific information. ThisCRICOS Provider No. 00103DInsert file name herePage 4 of 24
What is Data Science? Part B Exercise 1: What is Data Science?_4

procedure translates behaviour patterns and other related information into useful data so after that it may beaggregated as well as loaded into the relational database systems.What are the analyses that can be done using big data?Big data analysis is complete in distributed surroundings because big data analysed in some deeperanalysis, i.e. due to the required infrastructure, data mining and statistical analysis of various systems forstoring various data. Zooming can be done on large amounts of data. Analytical models can make betterdecisions automatically. Finally, the response time driven in changing behaviour can be delivered faster(Jee & Kim, 2013).CRICOS Provider No. 00103DInsert file name herePage 5 of 24
What is Data Science? Part B Exercise 1: What is Data Science?_5

Part B(4 Marks)Part B answers should be based on well cited article/videos – name the references used in your answer.Formore information read the guidelines as given in Assignment 1. Exercise 4: Big Data Products (1 mark)Google is a master at creating data products. Below are few examples from Google. Describe the below products and explain how the large scale data is used effectively in these products. a.Google’s PageRankIn 2005, the Google began to link Google's webmasters and blogs as "votes," a new attribute called alink to unfollow, which is a countermeasure against spam. The hyperlink page correspond to a singlepage vote, and a voting page is obtained through the significance of all linked pages. If there is nolinked page, the page may have greater number of relations or no hierarchy.b.Google’s Spell Checker This spell checker are used to spell words. It is a standalone application. It is called electronic dictionaries,search engine, word processor furthermore email customers. This spellchecker are used to separate wordswhen comparing during stem analysis.c.Google’s Flu TrendsThis trend of Google Flu are the web services operated by the Google that provide estimate of influenzaactivities in 25 and more than that countries. It estimates available historical information and presentresearch information for download (Kościelniak & Puto, 2015).d.Google’s Trends Google Trends are Google search-based web-based tool. When search terms are entered in differentlanguages for search in different regions of the world, they are usually displayed as search terms.Like Google – Facebook and LinkedIn also uses large scale data effectively. How?This is well-known facts, that generates a huge amount of information in website, because they are thesocial platform, moreover all of these information’s should be recognized against the user's behaviourpattern to get the recommendation. Such as, Face book are use for various activities that provide suggestionthat the users need to purchase or attends that is likely to be post on the page with explore criteria.Exercise 5: Big Data ToolsBriefly explain why a traditional relational database (RDBS) is not effectively used to store big data? According to a XYZ there is the 3 major reason why RDBS are not efficiently use to store a big data.Initially, size of data drastically increased within the PB level, and the ability to process such a large amountCRICOS Provider No. 00103DInsert file name herePage 6 of 24
What is Data Science? Part B Exercise 1: What is Data Science?_6

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Big Data and MapReduce
|23
|7772
|440

ITECH 2201 Cloud Computing School of Science
|25
|6295
|137

Big Data Characteristics, Tools and Applications - ITECH 2201 Cloud Computing
|8
|1913
|494

Big Data Workbook for Week 6 - ITECH 2201 Cloud Computing
|10
|3514
|434

ITECH 2201 Cloud Computing Assignment (Big Data)
|6
|2377
|124

Green Computing, Big Data, and Storage Design: A Workbook for Week 6-8
|12
|2466
|122