Big Data Research Project: Challenges, Applications, and Analytics

Verified

Added on  2022/09/01

|13
|4117
|15
Project
AI Summary
This Big Data project provides a comprehensive overview of the subject, beginning with an introduction to the concept and its importance. It delves into the challenges associated with Big Data, including capturing, storing, and visualizing vast amounts of data. The project explores various storage solutions, the differences between HDD and SSD, and the limitations of current data visualization techniques. The project then examines the practical uses of Big Data, focusing on predictive analytics and user behavior analytics. It highlights how businesses leverage big data for customer insights, identifying trends, and improving operational performance. The project incorporates real-world examples, references relevant academic literature, and follows APA formatting guidelines.
Document Page
Running head: BIG DATA
Big Data
Name of the Student
Name of the university
Author’s Note
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
1BIG DATA
Table of Contents
1 Introduction..............................................................................................................................2
2 Challenges in Big Data.............................................................................................................4
2.1 Capturing and Storing Data...............................................................................................4
2.2 Data Visualization.............................................................................................................5
3 Use of Big Data........................................................................................................................7
3.1 Predictive Analytics..........................................................................................................7
3.2 User Behavior Analytics...................................................................................................8
4 References..............................................................................................................................10
Document Page
2BIG DATA
1 Introduction
Big data is the concept that defines the vast amount of data that inundates the business on
a regular basis both organized and unstructured way. However what is relevant is not the volume
of info. This is what the businesses do for the critical info. Big data may be mined for
information contributing to stronger management choices and strategic steps (Chen, Mao & Liu,
2014). Big data is not just how much data the organization or individual have, but about what
they do about it. The user should take and evaluate data from either source and find the answers
that require:
Cost savings
Time savings
Innovative product creation and customized deals
Smart decision taking
By integrating high-powered analytics and large data, the user can perform business-
related activities such as:
Identifying the root causes of near-real-time errors, challenges and defects.
Generate offers at the point of sale, depending on the purchasing patterns of the consumer
(Marz & Warren, 2015).
Recalculating whole sets of uncertainties in minutes
Detecting dishonest activity before harming the company
Big data is the big aspect for the businesses. The proliferation of IoT and other mobile apps
has generated a huge increase in gathering, handling and processing the volume of knowledge
organisations. The opportunity for extracting new insights comes with the big data-for every
sector, small to large. Big data calls for big data, as big data is required to extract secret trends
and look for answers without overfitting the data (Gandomi & Haider, 2015). The more
performing data the businesses have, the higher the results, for deep learning. Today's large data
Exabyte open up endless possibilities to gather lessons that fuel creativity. From more precise
forecasts to improved operating productivity and enhanced consumer interactions, advanced
Document Page
3BIG DATA
usage of big data and analytics propels developments that will transform our world improving
futures, curing disease, defending disabled and resource management.
Characteristics of Big Data
Volume: The term Big Data itself has to do with a scale that's massive. Sample size plays
the very important role in assessing interest from the results. This always relies on the
amount of data whether or not a single data should truly be regarded as the Big Data
(Chen & Zhang, 2014). Size is also one attribute that needs to be taken into account when
coping with Big Data.
Variety: The next dimension of big data is the diversity it brings. Variety applies to both
organized and unorganized, heterogeneous outlets and the existence of results. During
earlier days, databases and spreadsheets were the only data outlets that much of the
applications regarded. In analytics systems, data in the form of documents, images,
photographs, tracking devices, audio and PDFs and so many are also being considered.
The abundance of unstructured data raises certain computing, processing and data
interpretation problems.
Variability: This applies to the confusion that the data will often represent, hampering
the method of being able to efficiently interpret and maintain the data.
Velocity: Velocity relates to the speed at which data is produced. How rapidly the data is
produced and analyzed to satisfy the demands, decides the data's true value. Big Data
Velocity discusses how quickly data travels from channels such as enterprise systems,
program records, networks, social networking platforms, mobile devices, sensors etc.
Data flow is huge and continuous (Raghupathi & Raghupathi, 2014).
Benefits of Big Data
It is able to access the Big Data provides many advantages, such as the businesses may
leverage additional information when making choices.
Big Data is able in accessing the social data from search engines and platforms such as
Google twitter and Facebook enables companies to customize their business strategy
(Wang, Kung & Byrd, 2018).
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
4BIG DATA
It provides better customer support Current consumer engagement services are being
substituted by modern Big Data technologies based programs.
Natural language modeling techniques and Big Data are used in these modern
applications for interpreting and analyzing user reactions.
It identifies the vulnerability related to the service or product, if any, Improved operating
performance.
Big Data technology should be utilized before deciding which data can be transferred to
the data center to establish a storage area or landing zone for new data. Therefore, the
incorporation of Big Data technology and data warehouse assists an enterprise in offloading
frequently accessible results.
2 Challenges in Big Data
2.1 Capturing and Storing Data
HDD (Mechanical disk drives) and SSD (solid state disk drives) are the most emerging
computing devices, with HDD drives being the foundation for the majority of large data storage
(Bhosale & Gadekar, 2014). HDD and SSD Solid State Drives and Hard Disk Drives are often
utilized as a backup tool for companies, with a performance density projected to rise for 20 per
cent. Attributes of HDDs are entirely different from those of SSDs. For conventional storage
architectures, input and output subsystems built with HDDs do not function with SSDs. Risk of
durability such as magnetic and overheating errors, and overhead disk exposure has rendered
HDDs unwelcome for large data storage, while price per gigabyte is fairly small (Hashem et al.,
2015). At the other side, SSDs will handle input and output transfer requests at a far faster pace
than HDDs, as no mechanical component is there, thus minimizing access time and thereby
increasing the rate of input and output. SSD's are more physical shock tolerant, therefore more
effective. The problem with SSDs is the expense per gigabyte. The expense for large data storage
to substitute all mechanical disks with the SSDs is unreasonably high.
With the pace of data overload, organizations and business’s computing structures face
big challenges from large volumes of data, and the ever-increasing volume of data produced.
Whatever its scale, data plays an important role in the industry. Wide data collection will
Document Page
5BIG DATA
generate interest (Tiwari, Wee & Daryanto, 2018). For instance, Facebook raises its ad sales by
collecting specific interests of its users and building profiles, telling marketers which goods they
are most involved in. Google also uses Google Search results, Gmail, YouTube and Google
Hangouts accounts to monitor activity of users. Big data demand for processing and storage
represents a significant obstacle, given the various advantages that can be achieved through the
broad data collection. Big data has obviously outgrown the existing architecture and is breaking
the mark on computing space and the database network. Because of the wide size of results,
current conventional methods cannot assist and perform successful research. This data
approaches the amount of data, which can be processed, measured, and recovered. The issue is
not so much the quality but rather data processing (Alharthi, Krotov & Bowman, 2017). Along
with a growth in unstructured data, the amount of data types has also increased. Image, audio,
social networking, data on mobile devices etc. are only a couple to call for. With big data, the
most useful usage cases include the quality of data, growing current data resources and providing
access to end-users utilizing business analytics applications with data exploration purposes.
Multidimensional data could be combined with analytics around large data and thus array-based
representation structures in memory should be investigated. Integrating multidimensional system
structures for big data involves developing query language for multidimensional extensions. The
data are being produced at an unremarkable speed with the prevalence of smart devices.
Nevertheless, for such unstructured data to be collected, recovered and analyzed, enormous work
is needed in increasing field.
2.2 Data Visualization
Dynamics and scalability are two major digital analytics problems. Big-data visualization
of complexity and variability is a huge challenge. Speed is the element you want for analyzing
big data. In Big Data this is not simple to develop a modern visualization method with powerful
indexing. To help handle the big data scalability, the cloud infrastructure and sophisticated
digital user experience can be combined with large data (Kam et al., 2015). Visualization helps
in the review of choices at each and every stage in the process. Visualization problems remain a
feature of warehousing of data and work on OLAP. There is room for high-dimensional data
visualization applications. For unstructured data structures such as maps, charts, tress, text and
other metadata, data visualization technologies will compete. Big data includes forms which are
mostly unstructured. Visualization would step closer to the data because of bandwidth
Document Page
6BIG DATA
constraints and power demands to effectively collect useful details. Visualization applications
will work in situ manner. The requirement for large parallelization is a simulation problem
because of the huge data scale (Skiba, 2014). The goal of simultaneous visualization algorithms
is to decompose a problem into different tasks, which can be performed concurrently.
Effective visualization of the data is a vital part of the big data age development phase.
Different ways of that dimensionality occur for the problems of high uncertainty and high
dimensionality in the big data (Kreft et al., 2017). They do not all comply however. The more
accurately the measurements are visualized, the better the likelihood of identifying possibly
important trends, associations or outliers. Multiple followings issues are there related to Big Data
visualization such as:
Visual noise: The bulk of dataset artifacts are too close to each other. The users can not
break them up on the computer as different items.
Loss of information: Reduction of identifiable data sets may be used which contributes
to loss of information.
Broad picture interpretation: Image visualization techniques are constrained not only by
the aspect ratio and system resolution but also by physical vision limitations.
High picture shift rate: Users view data and are unable to respond to the amount of data
adjustments or their show intensities.
High performance criteria: Due to lower visualization pace specifications, it can hardly
be found in static visualization-high performance criteria.
The interactive and conceptual scalability are both big data visualization problems. Data
visualizing each data point may contribute to over-plotting which may overpower the visual
which cognitive abilities of users; minimizing the data through filtering or sampling will elide
fascinating patterns or outliers. Requesting broad data stores can lead to high latency and
interrupt fluent interaction. Because of the broad scale and high dimensions of big data it is
challenging to do data analysis in the applications of Big Data (Aguiar-Pulido et al., 2016). Some
of the latest tools for the Big Data visualization have low scalability, accessibility, and response
time efficiency. Ambiguity can present a significant challenge for successful representation of
ambiguity and may occur during a conceptual analysis phase.
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
7BIG DATA
3 Use of Big Data
3.1 Predictive Analytics
Predictive analytics is the enabler of big data: companies accumulate vast amounts of
real-time data of customer and the predictive analytics, coupled with consumer experience,
utilizes this past data for predicting future events. Big data predictive analytics enable companies
for using the big data, which is collected in real time to shift from the historical perspective to
the customer's forward-looking perspective. Business intelligence and big data provide realistic
outcomes to predictive analytics (Hazen et al., 2014). Business technologies today raking new
consumer, business, social media, and real-time device, server, or product output data into
mountains. The predictive analytics is one means of exploiting all the knowledge, acquiring
meaningful new perspectives and staying ahead of the competition. Organizations employ
predictive analytics in a range of areas, from predictive analysis and data processing to the use of
the artificial intelligence and machine learning techniques in order to automate company
operations and discover new mathematical trends (Schoenherr & SpeierPero, 2015). This is
simply machines knowing how to handle other company activities differently from previous
experience, and offering fresh perspectives into how the enterprise actually operates. However
before we move through all the interesting aspects in which businesses and development firms
utilize predictive analytics to reduce time, reduce money and build an advantage on the rest of
the industry.
An analytics specialist makes use of the various methods, algorithms and data to get to
know trends. Such trends are a significant influence in decision taking over a broad spectrum.
The businesses will describe this entire cycle as the Predictive Analytics. The potential result in
decision-making is expected on the Predictive Analytics framework through the usage of
statistics, which is Big Data (Janke et al., 2016). There are several Predictive Analytics methods
which are deemed outstanding. There are also quite a few types of recognition and an automatic
segmentation. Predictive Analytics tools can be used in abundance in a number of automated
decision-making applications. Such devices may be placed into action by sales, although others
are cost-free. Market intelligence carries out quantitative analytics coupled with Big Data. In the
revolutionized form of Big Data, data generated by businesses will be leveraged to the benefit.
Big Data will be a great benefit in making measured decisions in sector. In the past, Researchers
Document Page
8BIG DATA
and data analysts who were strong at quantitative expertise knew the methodology of predictive
analytics. Nonetheless, Big Data makes storing and processing large volumes of data for
accelerated review trouble-free. Nowadays marketing strategies have become more streamlined
with the usage of applied Predictive Analytics. The threats involved in the different financial
industries also dropped. Predictive Analytics in the Big Data is an integrated method for
applying incremental information into market decisions together with accuracy (Bradlow et al.,
2017). Include Predictive Analytics in the approach if the businesses want to insure
their company prospers well in these tough times. The businesses will also require support from
qualified experts to allow the best use of these resources or solutions.
3.2 User Behavior Analytics
User Behavior Analytics (UBA) technology seeks for use habits that suggest irregular or
anomalous activity irrespective of whether the behaviors come from an intruder, insider, or even
malicious program or other procedures. Although UBA will not block hackers or outsiders from
breaching the network, it can detect their work easily, mitigating harm. UBA has the
close relation of Security and Information Event Management (SIEM). Generally, SIEM has
concentrated on evaluating events recorded in firewalls, operating system, and other device logs
to find important connections, typically through predefining laws. Through relying on perimeter
networks and OS logs rather than data itself, this is easier to ignore outsiders abusing their entry,
as well as hacker behavior, as hackers have gotten really good at posing as regular users while
they are inside (Assunção et al., 2015). That is where UBA falls started. UBA will discover user
habits by relying less on device incidents, and more on individual user actions, and then home in
on the hackers when their behaviors are different from legal users. UBA where the origins are
subjective also includes logs heavily, but the study focuses on devices, user accounts, user
identities and not, instance, hosts or IP addresses (Yeung, 2017). Types of post-processing of the
DLP and SIEM, where the outputs of DLP or SIEM are the main source data, and improved user
identification data as well as algorithms define these methods. Such applications may also gather
logs and background data themselves or from the SIEM and use various computational
techniques to generate new perspectives from the data.
User behavioral analytics applications provide more sophisticated identification and
tracking features than SIEM programs and two key roles are included. First, UBA software
Document Page
9BIG DATA
should be used to establish a definition of the company and its individual user’s unique daily
practices. Third, variations from standard may often be defined with these. UBA uses algorithms
for big data and deep learning to test such anomalies in almost real time (Deshpande, Sharma &
Peddoju, 2019). Although implementing user behavioral analytics on only one person might not
be helpful in detecting malicious activity, operating it on a wide scale may provide an enterprise
the opportunity on spot ransomware or other possible risks to cyber security, such as data
exfiltration, internal attacks and infected endpoints. UBA gathers various data forms including
user functions and names, including passwords, identities, and permissions; user interaction and
geographic location; as well as the security warnings (Iqbal et al., 2018). Such statistics may be
derived from historical and present events, and the study takes into account considerations such
as tools utilized, session length, interaction and peer group involvement to equate anomalous
behavior to. UBA networks face no dangerous coverage at any phenomena. Then they evaluate
the possible effects of the actions. If the activity concerns fewer vulnerable resources, it will earn
a low score for effects, when it includes something more important, such as publicly identifying
details, than a higher effect score would be earned. This method protection teams will provide
preference on whether to follow up because the UBA program automatically limits or raises
authentication complexity for the consumer exhibiting anomalous behavior.
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
10BIG DATA
4 References
Aguiar-Pulido, V., Huang, W., Suarez-Ulloa, V., Cickovski, T., Mathee, K., & Narasimhan, G.
(2016). Metagenomics, metatranscriptomics, and metabolomics approaches for
microbiome analysis: supplementary issue: bioinformatics methods and applications for
big metagenomics data. Evolutionary Bioinformatics, 12, EBO-S36436.
Alharthi, A., Krotov, V., & Bowman, M. (2017). Addressing barriers to big data. Business
Horizons, 60(3), 285-292.
Assunção, M. D., Calheiros, R. N., Bianchi, S., Netto, M. A., & Buyya, R. (2015). Big Data
computing and clouds: Trends and future directions. Journal of Parallel and Distributed
Computing, 79, 3-15.
Bhosale, H. S., & Gadekar, D. P. (2014). A review paper on big data and hadoop. International
Journal of Scientific and Research Publications, 4(10), 1-7.
Bradlow, E. T., Gangwar, M., Kopalle, P., & Voleti, S. (2017). The role of big data and
predictive analytics in retailing. Journal of Retailing, 93(1), 79-95.
Chen, C. P., & Zhang, C. Y. (2014). Data-intensive applications, challenges, techniques and
technologies: A survey on Big Data. Information sciences, 275, 314-347.
Chen, M., Mao, S., & Liu, Y. (2014). Big data: A survey. Mobile networks and applications,
19(2), 171-209.
Deshpande, P. S., Sharma, S. C., & Peddoju, S. K. (2019). Predictive and Prescriptive Analytics
in Big-data Era. In Security and Data Storage Aspect in Cloud Computing (pp. 71-81).
Springer, Singapore.
Gandomi, A., & Haider, M. (2015). Beyond the hype: Big data concepts, methods, and analytics.
International journal of information management, 35(2), 137-144.
Document Page
11BIG DATA
Hashem, I. A. T., Yaqoob, I., Anuar, N. B., Mokhtar, S., Gani, A., & Khan, S. U. (2015). The
rise of “big data” on cloud computing: Review and open research issues. Information
systems, 47, 98-115.
Hazen, B. T., Boone, C. A., Ezell, J. D., & Jones-Farmer, L. A. (2014). Data quality for data
science, predictive analytics, and big data in supply chain management: An introduction
to the problem and suggestions for research and applications. International Journal of
Production Economics, 154, 72-80.
Iqbal, R., Doctor, F., More, B., Mahmud, S., & Yousuf, U. (2018). Big data analytics:
computational intelligence techniques and application areas. Technological Forecasting
and Social Change, 119253.
Janke, A. T., Overbeek, D. L., Kocher, K. E., & Levy, P. D. (2016). Exploring the potential of
predictive analytics and big data in emergency care. Annals of emergency medicine,
67(2), 227-236.
Kam, H. R., Lee, S. H., Park, T., & Kim, C. H. (2015). RViz: a toolkit for real domain data
visualization. Telecommunication Systems, 60(2), 337-345.
Kreft, Ł., Botzki, A., Coppens, F., Vandepoele, K., & Van Bel, M. (2017). PhyD3: a
phylogenetic tree viewer with extended phyloXML support for functional genomics data
visualization. Bioinformatics, 33(18), 2946-2947.
Marz, N., & Warren, J. (2015). Big Data: Principles and best practices of scalable realtime data
systems. Manning Publications Co..
Raghupathi, W., & Raghupathi, V. (2014). Big data analytics in healthcare: promise and
potential. Health information science and systems, 2(1), 3.
Schoenherr, T., & SpeierPero, C. (2015). Data science, predictive analytics, and big data in
supply chain management: Current state and future potential. Journal of Business
Logistics, 36(1), 120-132.
Skiba, D. J. (2014). The connected age: big data & data visualization. Nursing Education
Perspectives, 35(4), 267-269.
chevron_up_icon
1 out of 13
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]