Big Data Challenges Report - Data Science, Semester 1, University Name

Verified

Added on  2022/08/17

|4
|543
|39
Report
AI Summary
This report delves into the challenges associated with big data, providing insights into key concepts such as RDBMS (Relational Database Management System), Key-Value Pair Databases, and the foundational behaviors of MapReduce. The report defines RDBMS as a system designed for relational databases, emphasizing the use of rows and columns for data organization. It then explores Key-Value Pair Databases, highlighting their non-relational nature and suitability for horizontal scaling. The report then discusses the foundational behaviors of MapReduce, including scheduling, synchronization, code colocation, and error/fault handling, providing an overview of how MapReduce manages and processes large datasets. The report also includes references to academic papers and patents related to the topics discussed. This report provides a comprehensive overview of the concepts, making it a valuable resource for students studying data science and related fields.
Document Page
Running head; BIG DATA CHALLENGES 1
Big data challenges
Institutional Affiliation
Student’s Name
Tutor
Date
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Running head; BIG DATA CHALLENGES 2
Big data challenges
What is RDBMS?
RDBMS stands for “Relational Database Management System”. An RDBMS defined as
DBMS designed for a relational database in specific. A relational database is a database that its
data stored in a planned presentation — done by the use of rows and columns, which helps in
making its interpretation easy as it eases the location and access of specific values within the
database. RDBMSes hence is a subset of DBMSes (Wu, Kumar, Chaudhuri, Jha, & Naughton,
2016).
What is Key-Value Pair Databases?
It is a type of nonrelational database that’s using simple key-values methods in keeping
data. Its data is stored as a collection of key-value pairs whereby the key performs as distinctive
identifiers. In this type of database, anything can serve as the key and the values ranging to
complex compound objects from simple objects. The majority prefers the key-value database as
it allows horizontal scaling at scales, which cannot be achieved by other types of databases (Li,
& Masiero, 2019).
What are the foundational behaviors of MapReduce?
Foundational behavior of MapReduce includes scheduling, synchronization, code or data
colocation, and error/fault handling. In scheduling, the MapReduce job is broken into individual
tasks for the map as well as reduce portions of applications. Before the reduction process takes
place, the application must be concluded. And the assignment is prioritized per the number of
nodes in the cluster. When several practices accomplish at the same time in a cluster, and one
needs a way of maintaining a smooth action of things then, Synchronization mechanisms assists
Document Page
Running head; BIG DATA CHALLENGES 3
in doing this automatically. Code colocation; most effective processing takes place if the code
collocated with the data that it is required to process on the same machine. The code and its
matching data can be placed on the same node before execution through the scheduling process.
Map-reduce engines at times have robust error handling as well as fault tolerance, which needs
attention despite all the nodes being in the Mapreduce cluster, and each node is in its place (Van
Aken, Pavlo, Gordon, & Zhang, 2017).
Document Page
Running head; BIG DATA CHALLENGES 4
References
Van Aken, D., Pavlo, A., Gordon, G. J., & Zhang, B. (2017, May). Automatic database
management system tuning through large-scale machine learning. In Proceedings of the
2017 ACM International Conference on Management of Data (pp. 1009-1024).
Li, H., & Masiero, R. A. (2019). U.S. Patent No. 10,255,378. Washington, DC: U.S. Patent and
Trademark Office.
Wu, X., Kumar, A., Chaudhuri, K., Jha, S., & Naughton, J. F. (2016). Differentially private
stochastic gradient descent for in-RDBMS analytics. CoRR, abs/1606.04722.
chevron_up_icon
1 out of 4
circle_padding
hide_on_mobile
zoom_out_icon
logo.png

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]