Cloud Computing (Big Data) - Homework Assignment Solution

Verified

Added on 2020/05/28

AI Summary

This document provides a comprehensive solution to a cloud computing and big data assignment. It begins with an overview of data science and the significance of big data, including the 7 V's (Volume, Velocity, Variety, Veracity, Validity, Volatility, and Value). The solution explores data acquisition methods, storage, and analysis techniques, including NoSQL and HDFS. Part B delves into Google's Big Data products like PageRank and Spell Checker, followed by an examination of big data tools such as relational databases, NoSQL databases (Riak, CouchDB, HBase, Cassandra, Redis), MapReduce, and relevant software like Hadoop and Excel. Finally, the assignment highlights big data applications in the financial, insurance, and manufacturing industries, demonstrating their potential for data analysis and optimization.

Running head: CLOUD COMPUTING (BIGDATA)
Cloud Computing (Big Data)
Name of the student
Name of the Assignment
Authors Note

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

1CLOUD COMPUTING (BIGDATA)
Part A (4 Marks)
Exercise 1:
1. Data Science refers to the emerging fields of social science, statistics, computer science,
information, and design.
2. In the past two years, the percent of the data IBM has been created in the world is 90%(Chen,
Mao & Liu, 2014).
3. The value of petabyte storage is a million gigabytes (1015) (Rajasekar, Dhanamani & Sandhya,
2015).
4. Foundation course: The foundation courses offer the students with the basic knowledge
and required skills, which are relevant in the area of information and data science. This course
offers design analysis, store, retrieve and many more.
Advanced course: Advance courses give deeper focus on the application and values of
data science. The course contains skill that is more complex and analytical methods like a
design, experiment, data visualization, and big data related problem-solving.
Exercise 2:
5. According to the author, the motivation comes from the use of big data and to the future
of big data. The fact that “Big Data is now the part of our life and Big Data hides a lot of
solutions to almost many problems of the industry” brings the actual motivation to researchers
and students. People are now part of the “Big Data Ocean” so this can be the other reason for
motivation. The author wanted to show the developing trends of Big Data and discover the true
value of it through this paper.
6. The 7 V’s mentioned in the paper are:
1) Volume: In a Big Data, Volume refers to the data size that is created from several sources
such as audio, video, text, social networking, research studies, crime reports, weather
forecasting and natural disaster as mentioned.
2) Velocity: Velocity of data is discussed with two perspectives.
a) The Velocity of incoming data where the business needs can be prepared with the
technology and with the process of database engine.
b) The Speed of movement of big data into a big storage through a fast response when the
data arrives.
3) Variety: Big Data variation depends on different shapes like images, video, text as it acquired
a direct interface from the users.

2CLOUD COMPUTING (BIGDATA)
4) Veracity: Veracity of Big Data mainly focuses on the reliability of the data. Thus, it is the
important steps to process big data to cleanse data.
5) Validity: Validity of Big Data refers to “data accuracy, and correctness with regard to the
intended usage”. The significance of the utility of big data is valid with the relationship
between the data elements and the intended consumption.
6) Volatility: Volatility depends on the data retention policy regardless of whether it is for
traditional data or big data. Implementation of it can be easily achieved in a relational
database.
7) Value: As compared to all the 6 V’s, Value has the desired outcome of the big data analysis,
while in the previous ones, they have features and approaches.
7. Big data have a lot of potential in the research community and the real world industry. It
has generated billions of data every day, which has made it a developing trend. Big data has
hidden many solutions to the problem of an industry, which have made it a part of our life. The
researchers have made understood the 7 V’s of which 7th V as ‘Value’ is the actual output of the
industry challenges and the issues. In addition to the Big Data Ocean, the data has dominated the
technology. The value of data should exceed the ownership, cost or management. The
governance of mechanisms largely depends on the data value. It is necessary to write and execute
the limit of the true value of enterprise for the data extraction of the policy and structure.
Typically, the data can be between the layers. In short, the risk will be low for the data, which
are at the higher level. Therefore, it accepted to be on a level where the cost is associated to have
high storage cost with a higher level of protection.
The research has shown the vision to improve healthcare as a product of Big Data
Utilization. Data on healthcare will eventually grow in the field of predictive medicine, make
efficient use of case studies, treatment of histories and diseases, and provide a prescription of
data and finally will bring improvement in healthcare. Though there is a fear about the Big Data
unintended harm, there is also a believe which will benefit at the end of the day by outweighing
such harm. So, a high amount of research is still needed to get the solutions, reasons, and
information. Therefore, Big Data Utilization will be the future work for every researcher and
each student to move forward.
Exercise 3:
8. From data partners, custom connectors, data partners, HTTP, logs big data can be acquired.
Once data is collected, it will be stored in the cloud and the enterprise will use it (Di Martino et
al, 2014).
9. The Data stored in the NoSQL and HDFS are pre-processed, transformed and then organized
to load in the data warehouse. A specific data required sessionization, which categorizes all the

3CLOUD COMPUTING (BIGDATA)
related data to turn them into a behavior pattern, which contains a lot of useful information. The
results are then loaded into the relational database systems.
10. Cloud, real-time analytics, statistical analysis and Data Streaming are the analyses that can be
done using the big data (Kambatla et al, 2014).
Part B (4 Marks)
Exercise 4: Big Data Products
11. Google’s PageRank is the page having higher-level links to determine the importance of all
web pages.
12. Google’s Spell Checker is used to check the spelling (Vivekavardhan, 2015).
13. Google Flu Trends tracks and maps flu-related search and gather information about the flu
trends in a given region.
14. It is the market trend analysis tools to search, collect hottest events or hot search.
15. Facebook and LinkedIn provide suggestion for some activities to let the user to decide
whether they want to like any products or posts on their page or search terms.
Exercise 5: Big Data Tools
16. A traditional relational database (RDBS) uses a two-dimensional tabular data model,
which cannot support multidimensional data since the capacity is limited in RDBS .
17. NoSQL database is a designed to solve multiple types of data containing large-scale of big
data application problems (Hashem et al, 2015).
18. Riak –It uses C, Erlang and some part of the Javascript language.
CouchDB – Use HTTP/REST protocol. It is easy to use and has database consistency
HBase – Uses Java language to support lots of rows and columns.
Cassandra – Uses Java language, support large tables.
Redis – Uses C/C++, Telnet protocol, and run fast.
19. MapReduce is defined as a programming model for a large dataset (more than 1TB) of
parallel computing to run the program on a distributed system(Li et al., 2014). The current

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

4CLOUD COMPUTING (BIGDATA)
software specifies the function of a group of key-value pairs to map to a new set of key-value
pairs to ensure that all key mappings for each share the same key group.
20. Hadoop, CouchDB, Apache, Disco, Climatic or Earthquake prediction (Oweis, 2015).
21. Availability, Scalability, Durability, simple interface and Reduced Redundancy Storage
(Deka, 2014).
22. EXCEL spreadsheet is the Microsoft's launch Office products, which is powerful spreadsheet
software (Jagadish, 2014). It is easy to operate, making powerful table and chart analysis of
Excel data add-XLSTAT. The statistical method is insufficient to perform statistical analysis as it
slows down the operation.
Statistical Analysis System (SAS) is a very comprehensive system to access data, manage
data, and analyse data capabilities. According to the international arena, SAS is standard
software analysis. SAS System is a modular structure of a software system, which are more than
thirty modules. The features of SPSS system is tables are more convenient, more complete with
statistical methods, more intuitive output.
Exercise 6: Big Data Application
23. Financial Industry: From the financial industry of the existing customers, there is a
questionnaire to data analysis and a segment of the population characteristics (George, Haas &
Pentland, 2014). They are formulating a different group of a customer for insurance, investment,
banking, and asset management, security such as financial strategy of product that can provide
one-stop financial solutions to clients and to get maximum value. The blockchains, increase the
big data security, the blockchain analytics and archive immutable compliance.
Insurance Industry: In this industry, the service needs to reduce the time in 10minutes to
process the complex claims. It also required, eliminating millions of dollars in the leakage and
fraud. It is a customer-centric profitable company. Another important use is the setting of policy
premiums for profit to cover the risk and to fit the budget of the customer. This industry follows
the principle of risk.
Manufacturing Industry: Manufacturing Industry uses advanced analysis techniques of
big data to reduce the costs and increase the production. Information of the production and
operation analysis of the production is used to provide help to simplify the process in a more
advanced manner. For example, in a biological, pharmaceutical production, more than 200
manufacturers monitor variables. This ensures that all the ingredients are pure and the creations
of all the materials adhere to the requirements of the compliance. Now at present, enterprises
improve the quality of their production and accuracy, reduce costs with the use of big data, and
produce a large number of products.

5CLOUD COMPUTING (BIGDATA)
References
Chen, M., Mao, S., & Liu, Y. (2014). Big data: A survey. Mobile Networks and
Applications, 19(2), 171-209.
Deka, G. C. (2014). A survey of cloud database systems. IT Professional, 16(2), 50-57.
Di Martino, B., Aversa, R., Cretella, G., Esposito, A., & Kołodziej, J. (2014). Big data (lost) in
the cloud. International Journal of Big Data Intelligence, 1(1-2), 3-17.
George, G., Haas, M. R., & Pentland, A. (2014). Big data and management. Academy of
Management Journal, 57(2), 321-326.
Hashem, I. A. T., Yaqoob, I., Anuar, N. B., Mokhtar, S., Gani, A., & Khan, S. U. (2015). The
rise of “big data” on cloud computing: Review and open research issues. Information
Systems, 47, 98-115.
Jagadish, H. V., Gehrke, J., Labrinidis, A., Papakonstantinou, Y., Patel, J. M., Ramakrishnan, R.,
& Shahabi, C. (2014). Big data and its technical challenges. Communications of the
ACM, 57(7), 86-94.
Kambatla, K., Kollias, G., Kumar, V., & Grama, A. (2014). Trends in big data analytics. Journal
of Parallel and Distributed Computing, 74(7), 2561-2573.
Li, M., Andersen, D. G., Park, J. W., Smola, A. J., Ahmed, A., Josifovski, V., ... & Su, B. Y.
(2014, October). Scaling Distributed Machine Learning with the Parameter Server.
In OSDI (Vol. 1, No. 10.4, p. 3).
Oweis, N. E., Owais, S. S., George, W., Suliman, M. G., & Snášel, V. (2015). A survey on big
data, mining:(tools, techniques, applications and notable uses). In Intelligent Data
Analysis and Applications (pp. 109-119). Springer, Cham.\
Rajasekar, D., Dhanamani, C., & Sandhya, S. K. (2015). A Survey on Big Data Concepts and
Tools. International Journal of Emerging Technology and Advanced Engineering, 5(2),
80-81.
Vivekavardhan, J. (2015). Search Engines for User Centric Information Retrieval and Scholarly
Communication. International Journal of Advanced Library and Information
Science, 3(1), pp-201.