A Report on Big Data and Data Analytics: Trends, Issues, and Solutions

Verified

Added on 2023/06/05

AI Summary

This report delves into the complexities of big data and data analytics, focusing on the challenges of classifying and analyzing large datasets, particularly in the context of network intrusion traffic. It explores the core characteristics of big data – volume, variety, and velocity – and proposes a network topology combined with machine learning techniques for effective classification. The report examines the existing literature on big data analytics, discussing the evolution of data storage and processing technologies. It highlights the importance of understanding big data traits for efficient and cost-effective analysis, and discusses the issues related to data veracity. Furthermore, the report outlines a methodology for addressing big data classification issues, including the creation of an effective network topology using technologies like HDFS and cloud computing. It also addresses the communication and security challenges that arise in managing big data, emphasizing the need for machine learning techniques to optimize data processing and ensure data integrity. The report concludes by providing a detailed overview of the challenges and solutions in the field of big data and data analytics.

Running head: BIG DATA AND DATA ANALYTICS 1
Big Data and Data Analytics
Student’s Name
Course
Date

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

BIG DATA AND DATA ANALYTICS 2
Abstract
Big data has been a challenge to classify and analyze for most organizations. This
proposal will focus on the big data classification issue on the network intrusion traffic. It does
predict the various difficulties that will be experienced when selecting the techniques for
classifying the big data. With more data being collected every day it is crucial for one to have an
idea on the characteristics of big data. The proposal will focus on the three characteristics of big
data which are velocity, variety and volume. It will propose a topology that will be combined
with learning techniques to help in the classification of big data. It will explain the expected
issues that will arise with the combination of the network topology and the technologies.

BIG DATA AND DATA ANALYTICS 3
Table of Contents
Abstract............................................................................................................................................2
Table of Contents.............................................................................................................................3
Introduction......................................................................................................................................4
Literature Review............................................................................................................................4
Study Type and Design....................................................................................................................6
Methodology....................................................................................................................................6
Managing Big Data..........................................................................................................................8
Learning Big Data............................................................................................................................9
Interaction of Big Data with Users................................................................................................12
Conclusion.....................................................................................................................................13
Reference.......................................................................................................................................14

BIG DATA AND DATA ANALYTICS 4
Introduction
Big data classification remains a big issue for most IT technicians all over the world. Big
data can be illustrated as big volumes of data that are both structured an unstructured and
overwhelm a business on daily basis (John, 2014). Big data in simple terms are large data sets
that the traditional processing software is unable to record, manage, and process the data sets
with minimal latency. Sources of big data have been classified to be sensors, devices, networks,
log files, and audio and video. With the ever-evolving technology, new sources of big data
include transactional applications, the web and social. Much of big data is mostly produced in
real time and on a very large scale (Provost & Fawcett, 2013). Today we have phones that are
equipped with the GPS technology which generate and also when we are online. People do leave
digital footprints after transacting anything that involves digital action. The digital footprints
generated continues to grow every day due to the increased population who are accessing the
digital world daily. Sensors are also equipped with the machinery in the industries and factories
which gather and transmit data every second. Data is also generated when people communicate
every day through the social media apps. This is a clear sign that data is growing at a rapid rate
and the analysis and the classification of big data continues to become an issue.
Literature Review
The analysis of big data is a crucial factor to analysts, researchers, and business users as it
allows them to make excellent and speedy decisions using the data that was before not accessible
and not usable (Zikopoulos & Eaton, 2011). With better classification and analytics methods
such as text analytics, machine learning, predictive analytics, natural language processing, and
data mining, big organizations are able to assess previous data sources that were untapped
independently or jointly with the current enterprise data. Through this, they are able to achieve

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

BIG DATA AND DATA ANALYTICS 5
new insights that output in excellent and speedy decisions. Big data continues to evolve and has
been an influence to the many ongoing waves of digital transformations. Technology is an item
that will never stop evolving. This means that more data will be generated as time goes on.
Before computers and databases, the transaction of data was done manually. There were paper
transactions, customer records and the archive files which were all bid data. The introduction of
computers and databases allowed a man to store and organize the big data in a manner that was
easily accessible (Manyika et al., 2011). This led to the phrase that information was available in
a snap of a finger.
Big data is currently classified into three aspects. They include volume, variety, and
velocity. At a certain point, the current technologies and techniques are unable to handle, process
and store the data when there is an increase in velocity, volume and variety of data. It is during
these situations that the data is referred to as big data. Many applications today suffer due to the
big data problem (McAfee, Brynjolfsson, Davenport, Patil & Barton, 2012). At certain points,
many people may be accessing a certain website once and an error pops up on the screen
indicating that the page the user was requesting for could not be found. This error erupts since
the web server cannot process the ambiguous data at a given time. many applications that people
use in their smartphones and computers have fallen victim to this. There have been new
technologies that have been created to help with the big data analytics issue especially on the
applications (Boyd & Crawford, 2012). These technologies have been proofed to greatly in the
classification of the big data. They include the Hadoop distributed file system, cloud technology,
and hive database. This proposal will focus more on the techniques and technologies of
classifying big data and the proof of big data. It will also have a big focus on some of the
challenges and issues that are linked with the assimilation of the current networking technologies

BIG DATA AND DATA ANALYTICS 6
and machine learning methods that are used resolving the big data issue for network intrusion
prediction. Network intrusion detection and prediction have been tested to be time sensitives
programs that are greatly affected by the big data problems. These programs do also need
extended growth in the big data domain and this explains the reason they are greatly affected by
the big data issues (Lazer, Kennedy, King & Vespignani, 2014).
Study Type and Design
The first crucial part will involve proofing the existence of big data. So as data can be
classified as big data, it must satisfy the three factors that it classified under which are volume,
variety and velocity. This step will be important in ensuring that big data analytics is adequate
and cost-effective. The next step after proofing the existence of the big data will be exploring the
strategies that will help in managing the big data. This may be done by creating an effective
network topology (Russom, 2011). Learning of big data will be important in the classification of
big data. This will also explore the challenges that still exist in the existing learning techniques to
handle the big data classification issue. With all these issues addressed, it will be possible to
resolve the existing challenge of big data classification.
Methodology
Solving the issues on classification of big data, the first step will be proofing the
existence of big data. To indicate that the network traffic data will be regarded as big data, it
must satisfy the characteristics of big data (Srinivasa & Bhatnagar, 2012). This step will be
important in ensuring the analyzing big data is efficient and cost-effective. Detecting big data in
the early stages is important to organizations as it prevents the organizations from deploying
irrelevant distribution of big data technologies (Maltby, 2011). For better classification of big
data, it will be important to be knowledgeable of the data traits for classification. These

BIG DATA AND DATA ANALYTICS 7
characteristics are volume, variety, and velocity. The volume will be used to determine how
ambiguous the data is. Just like the definition, big data is an ambiguous volume of data that the
existing software is unable to process, handle and store. The volume will be acclimated to
determine the growth of data size. The size of the data will be important as it will be used to
establish the value of the data. With the development of technology, data is produced through
human interaction over the internet, machines, and networks, the amount of data to be analyzed
is expected to be massive. This is expected to be an issue in classifying big data. Data collected
by organizations from relative sources especially the business transactions, social media, and the
machine to machine it will be classified as big data as it is collected in large volumes
(Rajaraman, 2016). Data collected from social media will be also classified as big data due to the
large volumes that are collected in.
variety attributes to the various sources and forms of data recorded. Big data can be
structured or unstructured. Data will be collected in a variety of types such as emails, audios,
videos, photos, documents or even monitoring devices. We will store data from various sources
such as spreadsheets and databases which are the traditional sources. The variety of unstructured
data is expected to pose various issues when it comes to storage, analysis and mining data. In
respect to big data, it is expected to be a variety of data to be classified as big data. Velocity
determines how data flows from sources (LaValle, Lesser, Shockley, Hopkins & Kruschwitz,
2011). Big data is expected to flow with a high velocity in the real-time operations (LaValle et
al., 2011). Technologies such as sensors and RFID tags produce large volumes of data travelling
at a high velocity in the real-time operations. The flow of such data is expected to be massive
and continuous. Real-time data will help in characterizing big data due to the big volumes of data
travelling at a high velocity. A new characteristic that big data is expected to fulfil is veracity.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

BIG DATA AND DATA ANALYTICS 8
Veracity defines the biases, noise and any abnormality in data. Being a difficult factor to define,
it is expected that the team analyzing the data to collaborate to ensure that the data is clean and
ensure that data processes do not accumulate in the system. As data travels from various sources,
it is expected to get certain impurities.
Managing Big Data
In managing big data, it will be required that an effective network topology is created. An
adequate distributed file system will enable to capture data, store and analyze the network traffic
for intrusion prediction. The cardinality issue in big data will be easily solved with the
introduction of the current technologies such as HDFS and public cloud (Suthaharan, 2014).
These two technologies will be combined to create an effective network topology containing a
storage base that will switch adaptively in the obligation for big data processing requirements.
The integration of these technologies is expected to bring new challenges which must be handled
efficiently. The new model will be comprised of four units which include the user interaction and
learning system, HDFS, cloud computing repository system, and the network traffic recording
system. The network traffic recording is expected to record the network traffic and stream data
collected to the HDFS and the cloud computing repository system in real time based on the
exception for extra storage. The HDFS does also require the hive database for storage of the
information. The user interaction and the learning system unit is expected to learn and control the
extra repository and data needs.
During the management of big data, communication challenges are expected to arise. In
the computing technology, communication will always be a big worry distinguished to the cost
of processing data. In the topology created between the combination of the HDFS and the public
cloud, communication cost is expected to be a major concern. It will be aimed at reducing the

BIG DATA AND DATA ANALYTICS 9
communication cost of the topology created while ensuring that there are an extra depository and
the data requirements from the public cloud for the big data processing. Two network aspects
that will influence the interaction between the cloud server and the client are expected to be
bandwidth and latency. Finding the solutions to these challenges will, on the other hand,
influence the timing needs of the processing of the big data at the HDFS and the user interaction
and learning system units. To solve this there will be a need to use the machine learning
techniques while still focusing on the problems and challenges.
Security challenges will be expected to develop when managing big data. Cloud
technology has a weak security mechanism. Any hacker with malicious intentions can easily
tamper with the data travelling between the cloud computing storage system and the HDFS. The
hacker can spoof the data and shut down the cloud computing storage system using a Denial of
Service attack. It will be important to find a security mechanism that will allow the use of public
cloud. If not addressed, it will be difficult to implement big data analytics mechanism using the
proposed topology.
Learning Big Data
Research indicates that there have been issues arising with the existing learning
mechanisms used to handle the classification of big data issues. There is a range of complex
aspects and will consist of many other influential data characteristics and will include a variety
of data forms, great velocity in the processing of data, a diversity of data classes, and
unstructured data. To address the difficulty aspects and the issues that arise, the best option to
use will be machine learning techniques.

BIG DATA AND DATA ANALYTICS 10
Machine learning is a technique that has been developed and used in the mining of
important information from the data by training and validation by the use of labelled of datasets
(Cuayáhuitl, van Otterlo, Dethlefs & Frommberger, 2013). When using the machine learning
techniques there are some three challenges that will be experienced. A machine learning
technique that has been trained on a certain labelled dataset won’t be appropriate for another data
domain that the comparison will not be powerful against the various data domains. Another
challenge experienced when using machine learning techniques is that a machine learning
technique is competent to using various class types and thus the big varieties of class types that
will be experienced in a vigorous developing dataset will cause faulty categorization of results. A
machine learning technique is created depending on one learning activity and thus they will not
be effective for the current learning activities and knowledge transfer needs of big data analytics
(Jap & Breier, 2014). Amid the machine learning mechanisms, there have been algorithms
developed to help in the classification of network data for intrusion prediction. One such
algorithm is the support vector machine techniques. The support vector machine techniques will
be suitable to in this situation since the classification accuracies given by the techniques are
excellent. The only issue that is expected with the support vector machine technique is that the
computational cost will be higher compared to other classification techniques (Wang, 2016). The
high computational cost makes it not effective for big data analytics. Despite all this, the best
option for the classification will be adopting the support vector machine techniques. The issue
that remains will be finding a better solution to improving the support vector machine technique.
Representational learning algorithms will assist the managed learning mechanisms to
attain a bigger classification accuracy with computational efficiency. These algorithms will
change the data while maintaining the traditional traits of the data to a different domain so that

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

BIG DATA AND DATA ANALYTICS 11
the categorization algorithms will develop effectiveness, increase the processing speed, and
minimize the computational complexity (Zheng, Li & Yang, 2011). Because of its big and
growing domain, the classification of big data will require the multi-domain representation
learning mechanism. The multi-domain representation learning technique will include feature
extraction learning, feature variable learning and distance metric learning. There are certain
challenges predicted that the technique will encounter in the classification of big data. These
challenges include the challenge in choosing the appropriate aspects, construction geometric
representation, extracting suitable aspects and the separation of suitable features (Mganga &
Charles, 2011). There have been other representation learning techniques that have been
suggested such as the cross-domain representation learning technique. This technique will assist
the classification of big data to function effectively. The issue that will remain will be adopting a
modern algorithm to function alongside the current technology.
The learning of big data traits in precise will not be effective for the long haul (Kübler,
Wieringa & Pauwels, 2017). This will require the use of machine lifelong learning mechanisms.
Machine learning mechanisms will provide a framework that will be able to retain learned
information with trained samples throughout the entire learning stages. The aspects of the
framework will be possible to integrate with the proposed network topology. The framework is
expected to face certain challenges during the classification of big data. One of the challenges
that the framework will face is scalability. Scalability refers to the ability of a network to
manage a growing amount of work or the ability of a network to be enlarged to accommodate the
growth. Scalability is an important aspect when it comes to big data applications (Al-Jarrah,
Yoo, Muhaidat, Karagiannidis & Taha, 2015). Scalability might be achieved in the suggested
network topology but communication challenges will make it difficult for data to be transmitted

BIG DATA AND DATA ANALYTICS 12
on time. one other expected to be experienced is the validation of the studied information and the
sustainability to a new data so as the studying action is not looped needlessly.
Interaction of Big Data with Users
Real-time access to data and processing stages will be an issue in the classification of big
data using the suggested model. So as one can detect the communication between the big data
guidelines cardinality, continuity and difficulty will be an issue and will require user interaction
(Obermeyer & Emanuel, 2016). When adopting the machine lifelong technique in the system it
will be recommended an extra user interaction to assist during the classification of big data
efficiently.
Traits of big data will cause data visualization to be a difficult action. The visualization
mechanisms such as dimension reduction and data projection will merely provide an abstract
view of the data (Witten, Frank, Hall & Pal, 2016). The abstract view will not be effective as it
will not give the real geometric representations for the data and this is expected to be one of the
challenges faced. To minimize the problem of big data visualization in the suggested model it
will be required to introduce the unit circle algorithm. The unit circle algorithm is expected to
give unit circle representations to both intrusion plus regular traffic by integrating the big
amounts of data points to a unit circle (Grandhi, Varma & Balusamy, 2017). By introducing this,
it will be possible for the user interaction and learning system to cause effective data repository
and transmission decisions.
There will be a lag in the data and data loss in the data transfer in the network topology
due to communication challenges. Data uncertainty caused by the missing data problem will
occur in the user interaction and learning system (Lin & Yen, 2014). This, on the other hand, will

1 out of 16

A Report on Big Data and Data Analytics: Trends, Issues, and Solutions

Paraphrase This Document

Paraphrase This Document

Paraphrase This Document

Paraphrase This Document

Related Documents

Exploring Big Data: Technologies, Analytics, and Business Impacts

Big Data in Business Marketing: Opportunities and Challenges

BSc (Hons) Business Management BMP4005 Big Data Analysis Report

BSc Business Management BMP4005 Information Systems Big Data Report

Research Proposal: Machine Learning Algorithm in Big Data Analytics

Information Systems & Big Data: Analysis, Challenges, Techniques

+13062052269

info@desklib.com

A Report on Big Data and Data Analytics: Trends, Issues, and Solutions

Paraphrase This Document

⊘ This is a preview!⊘

Paraphrase This Document

⊘ This is a preview!⊘

Paraphrase This Document

⊘ This is a preview!⊘

Paraphrase This Document

⊘ This is a preview!⊘

Related Documents

Exploring Big Data: Technologies, Analytics, and Business Impacts

Big Data in Business Marketing: Opportunities and Challenges

BSc (Hons) Business Management BMP4005 Big Data Analysis Report

BSc Business Management BMP4005 Information Systems Big Data Report

Research Proposal: Machine Learning Algorithm in Big Data Analytics

Information Systems & Big Data: Analysis, Challenges, Techniques

+13062052269

info@desklib.com