Big Data Modelling

Verified

Added on 2023/04/19

AI Summary

This research program aims to analyze the complexities and impact of big data on applications and propose an effective data management model. It explores the challenges and trends in big data modelling and discusses techniques for handling voluminous data. The proposed model focuses on increasing storage capability and ensuring data confidentiality and reliability. The research also highlights the importance of managing data from various sources such as social media and IoT applications.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.

Big Data Modelling

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Big Data Modelling
Executive Summary
In recent years, the researchers are working to provide a solution for managing the problems
related to big data environment in an effective and efficient manner. There are two major
technologies which are used for dealing with big data which are named as Hadoop file system
and Map reduce technologies. The large volume of big data requires special platform for
managing the data effectively in the simplified format so as to increase the level of data
availability and reliability. The purpose of this research program is to analyse the
complexities and impact of big data on the application and propose an effective data
management model for overcoming the problems related to the big data structured format.
The proposed data model helps in increasing the storage capability of the data format. The
designing of the big data architecture depends on the technologies used in developing the
infrastructure. The focus should be given on managing the interaction between the
components.
The aim of the research is to analyse the problems related to the management plan of big data
available due to excessive use of social networking platform and IOT applications. In this
paper, we are going to present the data modelling techniques which systematically arrange
the big data in service and infrastructure formats.
The proposed data model should be capable of handling voluminous data in a systematic
format for securing the information confidentiality, reliability, and accuracy. The
consideration should be given on the diversification of the data available from different
sources such as social networking platform, IOT architecture, and artificial intelligence. The
significance of the research study is to come up with the big data model for easy storage and
secured management of data available in large volume. The proposed architecture is the user
friendly management of inventory data to get effective results for solving the problem of big
data management. The development of the user interface provides suitability to the user to get
interactive with the features and availability of the product in the warehouse to be delivered
on time. The operations related to the inventory help in developing the interface to provide
statistics view of the data organization. It helps in finding the sales ratio to the availability of
the product in the warehouse. The implementation of the big data model proposed
architecture for managing the inventory of the organization helps the user to take proactive
action for managing and synchronizing the availability of the commodity. The danger to the
availability of the inventory can be predicted by comparing the present availability with the
threshold normal value. The Spark streaming process is used for getting the required and
1

Big Data Modelling
updated information of the data from the data warehouse. The MongoDB is used for reading
the information of the inventory from the batch processing unit. The notification and alert
signals are sent to the user for taking proactive action plan to keep the balance between the
availability of the commodity according to the demand placed by the user.
It can be concluded that the proposed model for managing the big data of inventory control
system is successful in getting the real time report of the commodity availability in the
warehouse. The user is alarmed with the alert signal sent to the mobile phones for informing
about the underflow and overflow condition of the data. The distributed environment is
developed for optimizing the sharing of the object in performing parallel processing.
2

Big Data Modelling
Contents
Executive Summary...................................................................................................................1
Chapter 1: Introduction..............................................................................................................4
Research Aim.........................................................................................................................9
Research Problem...................................................................................................................9
Research Questions..............................................................................................................10
Thesis Scope.........................................................................................................................10
Thesis Contribution..............................................................................................................11
Thesis Outline.......................................................................................................................11
Chapter 2: Literature Review...................................................................................................12
Introduction..........................................................................................................................12
Background...........................................................................................................................13
Big Data Modelling..............................................................................................................16
Tools and Technologies in relation to big data:...................................................................17
Big Data Modelling and Challenges and research issues.....................................................21
Big Data modelling progress and research trends................................................................26
Chapter 3: Research Methodology...........................................................................................56
Chapter 4: Proposed Big Data Architecture.............................................................................71
Related work:........................................................................................................................73
Proposed Architecture..........................................................................................................77
Description of the System Architecture...............................................................................84
Pre-Processing functions:.....................................................................................................94
Chapter 5: Implementation and Results.................................................................................101
Chapter 6: Conclusion............................................................................................................124
References..............................................................................................................................126
3

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Big Data Modelling
Chapter 1: Introduction
The enormous use of internet results in generating terabytes of data. The generation of huge
amount of data results in the critical problem for retrieving the relevant information from the
ocean of information within a fraction of seconds. The birth of technologies based on internet
of things (IOT) and cloud computing architecture requires a significant management schemes
for handling the accuracy and reliability of data (Adam and et.al., 2014 ). The structured
approach is required for managing the big data format for increasing the availability of the
data on the request placed by the user. The evolution of technologies in various domains such
as medical, government, banking, and many more increases the pressure on the management
of the data in structured way. In last few decades, the relational database is used for managing
the data in proper format and efficient in meeting the request and response placed by the user
(Acharjya and Ahmed, 2016). The inclusion of social media and IOT environment generates
the data in terabytes and petabytes which requires an effective system for meeting the
requirement of data availability. In recent years, the researchers are working to provide a
solution for managing the problems related to big data environment in an effective and
efficient manner. There are two major technologies which are used for dealing with big data
which are named as Hadoop file system and Map reduce technologies (Alam and et.al.,
2014). The big data is characterised as 5Vs which are named as velocity, variety, veracity,
volume, and value. The velocity of the data can be judged by analysing the process related to
data generation and collection. The velocity of the data transmission increases the efficiency
of the real time architecture of the system. The large amount of data generated due to the
inclusion of IOT application and social media networking characterise the big data volume.
The accuracy of the data is responsible for application usability which means big data value.
The value of the data can be measured in terms of monetization. The data is available in
variety of formats which has to be synchronized in structured, semi-structured or unstructured
manner (Chebtko, Kashlev, and Lu, 2017). The data analysis and management process is
required for synchronizing the data in the systematic manner for simultaneous retrieval of it
to meet the demand raised by the user. The reliability and trustworthiness is the measure of
the data veracity. The incredibility of the IOT application depends upon the accuracy of the
data retrieved in different formats. The data is available in three formats which are classified
in the table below:
Type of data Size of data Characteristics Use of
different tool
Methods
used
For examples
4

Big Data Modelling
Small Data Available in
mega bytes
Hundreds of
tuples can be
synchronised
effectively
Preparing the
database in
Excel
Simple
calculations
Customer
records of
the small
organization
Large Data Available in
Giga bytes
and terabytes
Millions of
tuples
organized in
proper
structured
format
Use of
relational
database
Data mining
and data
warehousing
techniques
Customer
records of
the big
organization
Big Data Terabytes
and
Petabytes
Billion and
Trillions of
tuples
organized in
proper
structured
format
Cloud
computing
technologies
Hadoop file
system
NoSQL
database
Distributed
file structure
system
Map Reduce
Technologies
E-commerce
trading
Social
Networking
website
Internet of
things
applications
The large volume of big data requires special platform for managing the data effectively in
the simplified format so as to increase the level of data availability and reliability. The
purpose of this research program is to analyse the complexities and impact of big data on the
application and propose an effective data management model for overcoming the problems
related to the big data structured format. The proposed data model helps in increasing the
storage capability of the data format. The designing of the big data architecture depends on
the technologies used in developing the infrastructure. The focus should be given on
managing the interaction between the components. The structured format should be
developed for managing the data sources such as social media, IOT applications, and others.
The streaming process should be done for managing the information related to the sensor
data. The ingestion of the information should be carried out in real time format by including
the map reduce technique so as to minimize the presence of replicated data at different data
server. The assimilation of the data is optimized in developing heterogeneous format. The
5

Big Data Modelling
operational efficiency of the big data architecture can be accelerated by developing the
master data management plan. The real time analysis of the data helps in getting the updated
version of the information. The functional and operational view of the data can be increased
by emerging the landing repositories of the information. The exploration of the data is formed
by organizing the data sets and blocks of information for getting the required information
with minimum period of time. The security and confidentiality of the information are two
major issues related to the data privacy of the heterogeneous data. The big data models results
in increasing the key performance indicators of the developed application. The hierarchical
taxonomy is prepared for developing the structured ontological structured format. The
synchronization of different and hybrid technologies together in a single format help in
developing the effective design model to get required information. The atomicity of the
information can be achieved by implementing the normalization techniques. The data models
designs are influenced by the schema formation and integration of different components. The
Hadoop model is used for storing the sensor data of the IOT application at distributed servers.
The correlation between the components helps in minimizing the latency time for retrieving
the updated information. The aggregation of the information helps in managing the
governance of the data. The database schema should be designed for forming the
visualization of data at distributed server. The information is stored systematically in
RDBMS for implementing the business intelligence operations on the working model of the
warehouse. The interaction between the components helps in preserving the well-structured
format to manage the internal sources data by streamlining the sensor data and devices. The
information is assimilated from various sources to give the real time situation and availability
of the data in the warehouse. The information can be timely retrieve through the streaming
process and inclusion of map reduce technology. The operational efficiency of the big data
can be improved by developing the master data management system. It focuses on managing
the operational data stores for improving the power of computation and procedure. The big
data architecture is developed for synchronizing the data patterns according to the data
collected from real time sensor devices. The information is stored in repositories of large
chunks of memory for systematic approach of real time data from the heterogeneous sources.
The activities are systematically sequenced for exposure of supporting functions and
operational working process. The data is governed by exploration of security architecture and
metadata formation. The advanced technology is defined for developing the structured
architecture of the data sets. The conceptual model should be developed by following the
semantics and symbols of data to prepare the data nodes. The consistency of the data items
6

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Big Data Modelling
should be defined for managing the interaction between the data components. The complexity
of the data is majorly defines according to the availability of the storage units and database.
The pre-processing of the applications helps in developing responses within the minimum
period of query generated by the users to extract the standard information. The integrated sets
of information are managed for accessing the information from multiple heterogeneous data
components. The underpinning of the data governance model helps in developing the
effective data management model. The integrated sets of data are arranged for developing the
data models. The physical models are developed for shaping physical and structured storage
of data items.
The schema of the data structure depends upon the physical arrangement of the database to
manage the flow of information. The SQL language is used for defining the physical
boundaries of the data for storing it in the Hadoop architectural platform. The expectancy of
the user access helps in developing the physical model to represent the concurrency control
on the data retrieval process. The data latency period is calculated for developing the real
time management of data in the warehouse. The latency period is influenced by the de-
normalization and batch processing unit. The correlation between the data items helps in
improving the performance of the processes synchronized into the query queue. The
cleansing operations is performed for calculating the aggregated function to transform the
data into redundant information at different server. The server is authorised for retrieving the
information from the data server where replicated information is stored. The mapping of the
processes should be done for focusing on the dependency between the tables and column data
through the application of the associative rule. The critical condition of the data availability
can be measured by calculating the threshold value of the system. The strong relationship
should be developed between the flow of data packets from sender to the destination. The
central control and authorisation of the business process helps in resolving the artefacts of
figures and semantics. The access control policies are developed for giving the structural
view of the Hadoop database. The consideration should be given on managing the atomicity
of data retrieval process. The unstructured view defines the weak schema and structured view
defines the strong schema for developing the paternal architectural format to resolve the
relevancy between the heterogeneous data sources. The modelling techniques are applied for
taking effective decision to extract the transactional information. The correlation between the
mobile traffic and network congestion helps in accelerating the transmission speed of the data
items. The business intelligence functions are implemented to synchronise the aggregation of
7

Big Data Modelling
the data marts function to defines the use cases. The scaling approach helps in balancing the
level of petabytes of information in sequential approach. The audit control program is
developed for retrieving the information from the historical data managed in hierarchical tree
structure format. The exploration of data is done through the audit control program for
applying the regulatory operations on the data sets. The decision tree is developed for
exploring the requirement and retaining the value of the business processes. The schema
definition helps in retrieving the information from multiple sources to manage the external
control on the data units. The read and write operations are performed by applying the
procedures of late binding program to improve the mapping of the data from various
operational stores of information. The repositories of data are developed to fulfil the access
control need of data. The logical infrastructure of the Hadoop file system is developed by
combining the downstream of data to achieve the accuracy and trust of information. The
schema designs should be developed for managing the operational view of the business
processes. The latency period for obtaining the response of the query can be minimized to
improves the consistency of the data units. The data processing unit helps in resolving the
governance of the functional unit of conceptual model. The text analytical approach should
be designed for fulfilling the data demand of the user and helps in accelerating the data
retrieval process. The auditing program should be defined for identifying the metadata and
extracting information from the transcript data processes. The information is organized in the
hierarchical tree structure format. The data is adjusted and transformed into specific and
relevant information by performing the cleansing operations. The underpinning of the
information helps in resolving the consistency of the data items arranged in the tabular
format. The repositories of the data sets are stored in the form of graphs and pictorial
representation of the information. The remote sensing parameters are used for developing the
networks to arrange the information in static grid format. The data is stored in the image
format to organize sequence of information in mapping the invalid data. The sorting of the
assorted data helps in increasing th operational skills of the database management program.
The large volume of data are divided into smaller valuable chunks of information through the
program of descriptive analytical approach. The assorted data is sequenced through the
measurable value of data mining operations and performing mining operations through the
strategic approach of forecasting relevant information. The verified information helps in
forecasting the strategies of displaying the data in sequential format. The decision making
capability can be improved the quality of data. The pattern matching of the information
increases the fault tolerance capability of the server to manage the data in effective format.
8

Big Data Modelling
The large volume of data helps in increasing the visualization of information to fill the gaps
of potential retrieval of information. The privilege should be given to the privacy issues of the
information for demonstrating the big data management system. The topology is defined used
for increasing the accessibility of operational data from the large volume of information. The
Google file system is maintained for developing the large data models on the network. The
speculation of the vulnerabilities associated with the data management system should be
foreseen for improving the data consistency and accuracy of the information. The expansion
of the information helps in accelerating the data retrieval process. The scalability of the data
server can be improved by developing the open source code of information. The
diversification of the data domain increases the solitary association of the data items to run
the program on different platforms. The clustering of the data can be systematically done
through the development of the Hadoop file system. The processing of the information can be
improved by associating the parallel computation system. The record keeping architecture is
developed by initiating the parallel processing unit in cluster and batch system. The Hadoop
architecture is developed by combining the operational efficiency of the distributed file
system and map reduce execution engine. The master slave architecture of the information is
developed for formulating the structural view of information. The flexibility of the database
architecture can be processed by implementing the read and write operations on the data
model. The name node are organised for processing the map reduce function on the data
blocks arranged in sequential format of information. The mapping of the information is done
by arranging the assorted data in sequential and sorted order.
Research Aim
The aim of the research is to analyse the problems related to the management plan of big data
available due to excessive use of social networking platform and IOT applications. In this
paper, we are going to present the data modelling techniques which systematically arrange
the big data in service and infrastructure formats. The focus should be given on overcoming
the inefficiency of the traditional data model by proposing an effective data model for
managing the big data.
Research Problem
The UML technology is basically used in traditional data models for systematically
synchronising the data in structured format. It is difficult to handle the exponential growth of
increasing data from various sectors such as health care centres, business organization,
artificial intelligence, IOT infrastructure, and others. The upcoming model should provide the
9

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Big Data Modelling
elasticity for managing the interfaces between the open data. The invention of NoSQL
databases helps in arranging the data without a relational schema. The security is the major
problems related to big data management system which has to be resolved with the proposed
data model. The available data models are not effective in managing the realm of big data.
The stability of the data depends on the organization of the big data in the systematic format.
The functional approach is used for operating on the big data for easy handling and retrieval
of the data according to the demand raised by the operator. The data should be physically
stored in the structured, unstructured, and semi-structured format. The Software as a service
model, NoSQL database, Mat lab, and Map reduce technology are used for handling big data
in structured format but the limitation of these models is that they generate lots of
intermediate data. The storage and management issues are the major problems related to the
big data. The data qualification, quantization, and validation are processes used for increasing
the accessibility of the data in the specified manner.
Research Questions
1. What is big data modelling? What are the challenges faced in the application of big
data modelling techniques?
2. What are the applications of big data modelling?
3. What progress and recent trends seen in big data modelling?
4. How big data modelling is useful for managing the voluminous data of cloud
computing and IOT architecture?
Thesis Scope
The research study helps in analysing the problems related to the storage and management of
the big data. The systematic arrangement of database is required for storing and managing the
data in the classified format so as to increase the availability of accurate data in the minimal
amount of time. The effective and efficient data model is required for managing the balance
between the request and response model and increases the efficiency of arranging petabytes
of data. The proposed data model should be capable of handling voluminous data in a
systematic format for securing the information confidentiality, reliability, and accuracy. The
consideration should be given on the diversification of the data available from different
sources such as social networking platform, IOT architecture, and artificial intelligence. The
significance of the research study is to come up with the big data model for easy storage and
secured management of data available in large volume.
10

Big Data Modelling
Thesis Contribution
The research helps in finding out the challenges and issues which come forward in the path of
applying big data modelling techniques for managing the large amount of data generated
from social media platform and IOT infrastructure. The focus should be given on the
application and technology required for keeping track of big data. The identification of recent
trends and progress opens new door for the research to find out the solution and technologies
which are capable for balancing the challenges and issues of big data. The research is taken
out in the direction to develop a model for managing big data which is useful for managing
the voluminous data of cloud computing and IOT architecture. In this paper, we are going to
propose and design the big data modelling technique for managing the stock inventory
system by sending the alarm and message on the mobile phone of the owner about underflow
and overflow condition of the stock available in the organization.
Thesis Outline
The research program is divided into segments so that the facts and figures can be collected
to analyse the big data modelling process. The first chapter focuses on the identifying the
requirement and importance of big data modelling for handling the voluminous data of
various application. The second chapter gives the details of literature and research carried out
in the field of big data modelling. The third chapter presents the challenges faced in the
application of big data modelling techniques. The fourth chapter focuses on the applications
and technologies used in big data modelling. The fifth chapter contributes in the direction of
highlighting progress and recent trends seen in big data modelling. The sixth chapter finds
out the architecture and framework of big data model which is useful for managing the
voluminous data of cloud computing and IOT architecture. The seventh chapter presents the
implementation of big data modelling in real life example. The research program ends with
the conclusion and the future scope of the thesis.
11

Big Data Modelling
Chapter 2: Literature Review
Introduction
The study of the literature helps in analysing the complexities and problems associated with
the management of the voluminous data. The special data models are required for managing
the exponential growth of the data generated from different sources such as social media,
artificial intelligence, IOT, and others. The heterogeneous data requires systematic
management in different data formats so as to increase the accessibility of it. The big data
tables are drawn for arranging the big data generated by the social networking platform. From
the research, it has been found that only 17% of big data is available in structured format and
83% of big data is available in unstructured and semi-structured format. It has been
researched that around 600 TB data per data is only generated by Facebook only. The data
resides on the internet can be biased and creates noise in the data transmission process. The
large volume of real time information can be stored in the memory for minimizing the
precision of information. The consideration should be given on managing the biased data
under separate data format so as to improve the velocity of the data transmission process.
Different companies’ uses different data model for managing their big data. The analytical
processing methods are used for managing and storing the voluminous data to minimize the
response time and balancing workload in systematic format. Map reduce is the parallel
processing technology which is the functional approach which is used for storing the data in
the Hadoop file structure. The major source of big data is social networking. The social
network analysis (SNA) is used for mapping the formal and informal relationship between
the different data available. The pattern matching and text mining technologies are majorly
used for balancing the request response model. The accuracy is required in advanced data
visualization (ADV) for improving the decision making capabilities to take effective decision
and responsible action. The efficiency of the data model can be judged by the level of
abstraction provided to the information. The information should be abstracted in the three
levels which are classified as physical level, logical level, and conceptual level. The
composition of the design process helps in analysing the power of the procedure to handle the
security and accuracy of the information. The operational database makes use of entity
relationship model for presenting the conceptual level abstraction of information. The
decision support system makes use of star schema format for presenting the logical and
physical level abstraction of information. The big data makes use of Graph database for
presenting logical and physical level abstraction of information. The construction of data
12

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Big Data Modelling
model should be based on level of abstraction used for securing the information from
intrusion. The efficiency and effectiveness of the data model depends on the level of
abstraction used for retrieving the information from the database. The suitability of the data
model depends on the following six features which are listed below:
1. The suitability of the data model depends on the application domain used by the entity
2. Read and write operations performed on the database
3. Types of operations used on the database
4. Operations carried out by using concrete languages
5. Level of abstraction used for presenting the information
6. Query language used for extracting the information from the large volume data
Background
The rapid increase in the voluminous data requires the effective development of data models
which can synchronise the data and transmit simultaneously according to the requirement
placed by the users. The decision making process depends on the unstructured data which has
to be presented in the simplified structure so as to minimize the complexities. The
classification of big data is categorised into 5 different aspects which are represented in the
table below:
Big Data Categorization
Different
sources of data
Format of
Content
Data stores Data staging Data processing
unit
Social
networking
websites
Unstructured Documentation Normalization Batch
processing
system
Companies Semi-structured Key Value
architecture
Scaling Real time
processing
system
Internet of
things
applications
Structured Big data table Transformation
Artificial
Intelligence
Column
orientation
Transactions Natural
13

Big Data Modelling
language
processing
architecture
Personal
Computers
Graph based
Tokenization
The data source is used for representing the data from different platforms such as social
media, database of customers of big organization, sensor data used in IOT environment, and
many more. The uniform resource location is used for sending and retrieving information to
manage the big data domain effectively. The NoSQL database is used for managing the
communication between the objects so that real time processing and flow of information can
be managed (Ribiero, and et.al., 2015). The normalization process is used for reducing the
data redundancy so that the minimal space can be utilized for storing the information. The
clustering of data is used for partitioning the data and arranging it in the hierarchical
architecture so as to balance the probability distribution of data. The large volume of data is
equipped with noise frames which creates the complexities in the transmission of big data.
1. Data model used for handling social media:
The Big data table is the generic data model used by the Google in managing the big data of
social media platform. The contents and comments of the user account are systematically
handled in the big data table. The Key-Value is the generic data model used for manging the
records in the row-wise structure format. The column value of the row-wise record
architecture act as a unique identifier for finding the contents and comments posted by the
user.
2. Data model used for handling cloud computing:
The cloud computing data can be systematically organized in a particular schema. The data
can be effectively stored in the metadata form which is arising from different sources. The
Dublin core metadata development (DCME) is used for managing the metadata in the
unstructured format. 15 elements are used for dividing the data under 15 different categories
which is shown in the table given below:
1. Title 2. Rights 3. Explanation
14

Big Data Modelling
4. Location 5. Date 6. Relation
7. Language 8. Creator 9. Format
10. Publisher 11. Contributor 12. Type
13. Subject 14. Identifier 15. Source
The big data dictionary is used for managing mapping between the metadata to provide
effective information to the user on their demand.
3. Data model based on Ontology architecture:
The conceptualization and abstraction methods are used for representing data of the real
world. The ontology architecture is used for representing the heterogeneous data by
representing their constraints and identification in a clear specified format. The unstructured
data can be simplified by using the map reduce concept in the framework of ontological
technology. The high scalability can be provided by using the map-reduce technology which
results in increasing the performance, fault tolerance capability, and speed of query
processing.
4. Natural Language processing (NLP) ontological data model:
The unstructured data collected from different sources is systematically organized by using
the natural language processing model. The communication can be effectively done by using
the grammar rule to express feeling, opinions, and thoughts. The different techniques which
are used for managing the data on the NLP architecture are anchoring method, reframing, and
future spacing. The text can be expressed in structure format. The Tokenization process is
used handling the semantic tagging which is used for sensing the words and phrase in the
paragraph. The pattern matching techniques is used by the intelligent agent for analysing the
presence of text and content in the IOT environment. The NLP technology is used for
managing the relationship between the request and response system so that dependency
among the different domains can be effectively balanced.
5. Use of clustering algorithm:
The clustering of the big data can be done by using clustering algorithms which are classified
as K-means clustering algorithm, Guasian clustering, spectral clustering, and many others.
15

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Big Data Modelling
Big Data Modelling
The data models play a vital role in managing the operational databases, big data
technologies, and decision support system. The data models are effective for managing the
operational and functional program related to the entity relationship model. It is effective for
managing the star schema architecture and OLAP models for taking effective decision. The
data models are categorised into three categories which are discussed below:
Physical Data model:
In this model, the data is represented in the hierarchical order of classes. The entities are
comprised of attributes which are used for representing the characteristics and behaviour of
the class. The entity relationship diagram is used for representing the relationship between
different classes with the classification of their attributes. The query language is used for
extracting the information from the RDBMS.
Logical Data model:
This model is the pictorial representation of the conceptual model. The entities are platform
independent which is uniquely identifies through the primary key and foreign key. The
mapping process is implemented for identifying the relationship between the participating
classes and data objects.
Conceptual Data model:
The layer of abstraction between the classes and objects is defined through the conceptual
representation of the data. The problem domain is specifically defines the relationship
between the attributes and primary keys for managing the flow of information in the database
schema. The conceptual data model is used for analysing the data value and the relationship
between the entities and attributes to clarify the characteristics and behaviour in more
simplified format (Sharma, and et.al., 2014). The complexities of the physical database can
be resolved by representing the data through the process of conceptualization. The higher
level of abstraction can be created through the use of conceptual data model. The voluminous
data is generated due to the association of the mobile networks. The assimilation of the
mobile phones and cellular phones helps in increasing the mobility of the data packets from
sender to the receiver. The validity of the information depends on the reliability and accuracy
parameters used for measuring the network requirement to increase the interaction between
the data users. The specification should be optimized for managing the resource utilization
16

Big Data Modelling
among the cloud devices. The scaling of the devices can be increased by developing the layer
of abstraction between the isolated devices placed in the network model. The devices and
information are scheduled and provided according to the priority measured through the
priority scheduling algorithm. The virtualization of the information at different layer of the
processing model helps in increasing the retrieval of the data by collaborating the storage
devices and resources for enabling the cross layer computation program. The placement of
the information depends on the computational power interpreted by the applications
scheduled at different data stores and management technology. The complexity of the
resources can be manageable by measuring the accuracy and reliability of the information.
The validity of the processing parameters used for defining the multi-dimension view of the
information to give accurate information. The reliability of the resources distribution can be
increased by arranging the dynamic data on the columnar table of the data stores. The cost
estimation ratio is developed for increasing the efficiency of the work flow and data flow by
organizing th schedule of processes operation. The scalability of the data utilization depends
upon the availability of information from the heterogeneous and varying format. The search
engines are placed for finding the information from the name node at data stores and centres
by preparing the block ID index table. The business intelligence program is used for injecting
the queue of processes and resources on the database. The acquisition layer is used for
preparing the batches of the processes by integrating the information and functions from the
HDFS environment. The transmission of the required information takes place by utilizing the
extraction and loading function on the data stores. The ingestion of the information takes
place through the pre-defined and pre-processing read and write operations by updating the
information as per demand placed by the user. The transmission of mission and alert signal
helps in increasing the performance of the database. The automatic decision can be taken by
analysing the real time situation of the devices according to the information collected from
the smart devices. The external sources are arranged by sequencing the data streams and
messaging queue for preparing the paradigm of information. The persistency of the
information can be improved by dividing the information according to the variety and data
volume available on the heterogeneous environment.
Tools and Technologies in relation to big data:
The exponential growth of data requires an effective tools and technologies for managing the
big data in a systematic and sequential format. The innovation in sensor data and social
networking puts pressure on the data models (Malik, and Sangwan, 2015). The RDBMS is
17

Big Data Modelling
ineffective in storing the big data in its customised data tables. The inefficiency of RDBMS
can be resolved by making use of NOSQL architecture. The non-relational database is useful
for handling the big data in the structured manner (Patil, and Bhosale, 2018). The
normalization process is applied for reducing the redundancy of the data stored in the NoSQL
database.
Key-Value architecture: The data dictionary is constructed for mapping the data available in
the database through the key value pair. The unique identifiers are used for representing the
key value for accessing the data from the voluminous data available. The data values are
stored in the array data types.
Column based database: The data are stored in columns in the column oriented database. The
data is stored in the compressed form. The information can be retrieved from the database by
designing the queries in a systematic format. The systematic arrangement of database is
required for storing and managing the data in the classified format so as to increase the
availability of accurate data in the minimal amount of time.
NOSQL database: The NOSQL database is the no schema database architecture. The
information and data is stored in the tabular format. The relationship between the architecture
can be represented through the unique identifiers and a key value.
Document oriented database: The information and data is stored in the text format in MS
Word format. The text file is extracted from the database by using the key value pair. The
index value is used for searching the data in the data dictionary so that request and response
can be balanced systematically.
Map Reduce Technologies: The Map reduces technologies is used for clustering the data in
the Hadoop file structure (Gopu, 2017). It is the parallel processing program for filtering the
data in the big data clusters. The aggregation function is used for generating the map function
for reducing the redundancy of the heterogeneous data. The unstructured data is organized in
the systematic format.
Wide column stores database: The key value pair act as a unique identifiers in representing
the data in the wide column database. The primary key and foreign keys are used for
extracting the required information from the huge volume of data (Sai and Jyothi, 2015). The
batch processing helps in sorting and filtering the data from the data clusters.
18

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Big Data Modelling
Hive: The Hive platform is the advanced version of the Hadoop file structure. The data
mining tools are used for extracting the information from the big data warehouse system.
Data mining tools are used for retrieving the information in the form of plain text and Hbase
file. The business intelligence can effectively run on the Hadoop clusters. The SQL queries
are used for retrieving the relevant information.
Storage: Data compression technique and data redundancy methodology is used for
minimizing the space consumed by the large voluminous data. The virtualization of the
storage units is used for handling the data in the compressed form.
HBase: The scalability of the database can be increased by using the Hadoop distributed file
system. The information is stored in the column format in the structured manner.
Graph database: The graph database is used for presenting the relationship between the
domain and entities by using the key value pair (Kaur, and et.al., 2016). The patterns are used
for detecting the relationship in the social networking architecture. The user friendly
environment can be created by using the graph database.
Chukwa: Chukwa is the data analysis technique which is used for monitoring the large
voluminous data for transmitting peer to peer information. The semantics are added to the log
collection for managing the data in the specified format.
Batch oriented processing: This paradigm is used for storing and analysing the information in
the structured manner. The parsing and sorting function are majorly used for analysing the
data presented in the batch. Map reduce is an example of batch processing system (Khan, and
et.al, 2015). The chunks of data are collected for performing action of processing queries.
Stream-oriented processing: The real time synchronization and flow of information is
effectively done through the stream oriented processing methodology (Sullivan, Thompson,
and Clifford, 2016). The large volume of real time information can be stored in the memory
for minimizing the precision of information. Data stream management system is used for
solving the SQL queries so that relevant information can be retrieved according to the
demand placed by the user.
Online transaction processing: OLTP is used for improving the performance of the Relational
database. The performance of the database can be improved by adopting the NoSQL
19

Big Data Modelling
database. The interaction between the entities is carried out through the query language. The
low latency can be created through the query interface (Elgendy, and Elragal, 2014).
The emergence of big data is equipped with varying complexities due to the massive data
such as information security, confidentiality, reliability, accuracy, scalability, computational
discrepancies, and others. The statistical methods are used for scaling the big data. It is
difficult to visualize the data from the large volume of data collected from different fields.
The security and accuracy is the major concern in dealing with big data semantics (Mo and
Li, 2015). Graphical interpretation of the relationship is the major factor responsible for
finding the relationship between the different entities. In the large volume of data, it is
difficult to find out meaningful relationship and correlation among the participating units.
Authentication and encryption methodology is used for preserving the confidentiality and
sensitivity of the information. Real time monitoring of the data is useful for managing the
communication in the IOT application and analysing the presence of intrusion. The security
model should be deployed for identifying the presence of intrusion and making the
information secure from hacking (Mukherjee and Shaw, 2016). Traditionally, the records are
stored in the systematic order by using a unique identifier due to the limited amount of
information available which can be easily handled by the RDBMS. Data modelling
techniques are required due to the exponential growth of big data. It is difficult to manage
relationship between the entities and the data sources. The focus should be given on
managing the relationship between the open data available. The security of the information is
the major concern in the corporate governance. The new paradigm is required for data
modelling to handle the big data effectively. The efficiency of the data model can be judged
by the level of abstraction provided to the information. The available data models are not
effective in managing the realm of big data. It has been researched that the traditional system
of handling big data does not provide effective linking between the information. The big data
requires open interface among the information stored in the database schema. The data is
stored in the unstructured manner in the database (Rahman, et.al., 2018). The consideration
should be given on designing the data model which is relevant in managing the big data
securely and with accuracy (Pol, 2016). The inclusion of social media and IOT environment
generates the data in terabytes and petabytes which requires an effective system for meeting
the requirement of data availability. The storage and management issues are the major
problems related to the big data.
20

Big Data Modelling
The study of literature accumulates the challenges and issues which arise in fetching required
information from the large amount of data. The security of the information and computation
complexities is the major problems which creates the disturbance in getting relevant
information from the ocean of data. The categorisation of big data challenges and issues are
divided into four main problems which are named as systematic analysis and storage of data
and information in the, computation complexities and knowledge management program,
visualization of information, and lastly security of information. It is difficult to keep track of
required information from the data volume. The systematic framework should be developed
for keeping the files and information in the particular order so that relevant information can
be fetched according to the required query posted by the user. The knowledge management
program plays a vital role in balancing the information in the sequential order.
Big Data Modelling and Challenges and research issues
Big data model is the new paradigm provided for managing the information infrastructure to
support queries and responses from the voluminous data. The interdependencies of processes
helps in stabilizing the data schema designs developed for supporting transaction of
information from the database. The scaling of the information can be effectively done
through the parsing process. Parsing process is used for exchanging information from the vast
data collected from different platforms. The unstructured data is collected from the internet
websites, IOT applications, social networking platform, remote areas, and others. The
remarkable keys are developed for retrieving the information through the process of mapping.
The graph based organization of data helps in sorting the quality of data set for the particular
query posted by the user.
There are various issues related to the management process used for handling the big data in a
structured format. The expansion of the information is equipped with the hazard of managing
voluminous data in a structured format and orientation so that relevant information can be
fetched in a fraction of seconds which helps in speed up the communication between the
sender and receiver. Some of the issues related to big data characteristics are discussed
below:
Data Velocity: The research helps in investigating that the expansion of information from
different platform is the major hurdle in transmitting the information in fraction of second
between the participating members. The transmitting speed of the packet depends on the
21

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Big Data Modelling
network congestion due to the occurrence of voluminous data. The effective policies and
procedures should be developed for managing the network congestion and traffic.
Data volume: The balance in the structured format depends on the availability of large data
set to handle the required query effectively. The interdependency of the data to manage the
process can be effectively control and balance. The persistency of the information depends on
the amount of data available in the database. The database should be capable to handle the
voluminous data in the structured format so that retrieval process can be speed up
accordingly. The communication between the sender and receiver can be done in terabytes of
data by using the efficient big data model and technology.
Data Value: The confidentiality and security of the information stored in the database is the
major issue in handling big data. The information should not be cracked and leaked by the
third party. The tools and technologies should be used for preserving the reliability and
accuracy of the information so that organization of the process can be effectively carried out.
The ethical code of conduct should be implemented for managing the privacy of the
information and data required by the user. It helps in resolving the issues related to the data
storage and management process.
Data Variety: The synchronization and arrangements of data and information in a structured
and unstructured format helps in distinguishing the availability of data in different structures.
The expansion of the database framework is required for managing the variety of data in an
effective manner so that query posted by the user can be answered within the fraction of
seconds. The assemblage of data and information in an unsystematic manner can create a
problematic situation for handling the responses. Symbols and semantics are the major
problem with regards to the data availability. The limited data can be stored in a systematic
manner. The jumbling of data creates the problem is identifying the required information
from the ocean of data.
Data Storage: The major issue is the storage capacity of the database which is implemented
for handling the data of the application. The overflowing of the data creates a problem of
retrieving required information from the database which creates the problem of data leaking
and confidentiality loss of the information. The terabytes of data requires special mechanism
for handling the huge amount of data so that under-flowing and overflowing of data can be
kept under control.
22

Big Data Modelling
Data management: Managing the data is equally important to the data storage techniques. The
complexity of the data depends on the management program used in the database for retrieval
of the required information. The effective management program depends on the organization
of the information in the sequential manner. The setup of data management program focuses
on managing the data privacy, sensitivity, confidentiality, accuracy, reliability, and
availability. The rules and functions applied for preserving the data privacy by making the
information preserved from the unethical fetching of the information from the third party. It
therefore, helps in maintaining the data confidentiality and accuracy. The user is able to get
the required information which is reliable for future use. The ethical code of conduct should
be implemented for securing the data management framework from unethical accessing of the
information from the third party.
Process management: The simplification of the process can be done by dividing the process
into several fragments so that the frames can effectively resolved to give the final answer.
The processor should have a capability to solve six gigabit information within twenty
nanoseconds. It helps in achieving the accuracy and minimise the response time. The
computation speed should be of high level because it helps in minimizing the complexities of
network congestion. The examination of the process management techniques depends on the
information gathering process which is used for collecting the data from different platform,
arrangement of resources on the demand of the user, modifying and manipulating the data
according to the request placed by the user, simulating the information, and visualization of
the information.
The systematic arrangement of the big data is equipped with various challenges which has to
be handled proactively so that the developed big data model is effective for managing the
information in a structured and semi-structured format. The information which is available in
the unstructured format is systematically synchronized so that the data transmission process
can be speed up and quality of service can be provided to the participant. The major areas in
relation to the data and information which should be taken under consideration while
developing the big data model for managing the voluminous data. The available data models
are not effective in managing the realm of big data. The stability of the data depends on the
organization of the big data in the systematic format. The functional approach is used for
operating on the big data for easy handling and retrieval of the data according to the demand
raised by the operator. Some of the major areas are discussed below:
23

Big Data Modelling
Ontology application: Ontological methods are used for identifying the connection between
the ontological assets so that semantic relationship can be identified. The semantic
relationship with the data should be enormously identified for providing effective mapping
and reasoning between the information provided to the user. The automation in the big data
framework can create a problem in the management of the information in a sequential order.
Data confidentiality and security: Pre-handling process should be applied for preserving the
data from leakages so that accuracy of the information can be preserved. The private trading
of the information should be stopped to provide security to the information and preserving
data confidentiality. The cryptography methods should be applied for managing the data
accuracy. The major privilege is given to the data insurance so that the transmission of
information to the third party can be minimized. The massive arrangement should be done for
handling the gigantic data and overcoming from the problem of big data storage. The
accessibility of the information can be achieved.
Transportation process and storage capability: The customization of information helps in
securing the distribution of data at different centres. The sensor information should be
preserved by applying the process and procedures to manage the information securely.
Data accessible: The computation complexities should be resolved to increase the
accessibility of data from various platform such as social networking platform, IOT
applications, company database, and others. The strategy should be developed for managing
the communications and upgrading the innovation at server side. The legitimacy of the
information should be secured for managing the flow of information in different channels.
Inconsistent flow of information: The multi-dimensional view of big data for targeting the
innovations in the big data modelling structure helps in reducing the inconsistent flow of
information. The wrapping and mapping of the information help in improving the data
management and storage process. The focus should be given on managing the information at
different levels such as data level, physical level, and information retrieval manner. The
tendency of managing the information depends on the temporal and functional
interdependencies. The inconsistency of the information can be resolved by developing the
strategic management of the big data and stored it in systematic and structured format.
Information transmission: The expansion of the applications increases the pressure on
retrieving the information from different platform and arranging it in a predefined structured
24

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Big Data Modelling
format so that transmission rate of information can be effectively minimized. The versatility
of the information should be stored in the sequential and synchronized format. The potentials
of the transmission process depend on the designing of the data sets. The mobilization of the
information focuses on minimizing the time complexities and space complexities during the
data transmission process.
Data analysis and storage mechanism: The difficulties in the management of data and
information create the increment in the time complexities and space complexities. The
significant approach should be implemented for handling the voluminous data in a desired
and effective manner. The on-demand transmission of information and data increases the
complexities to provide relevant information. The big data model should be prepared for
overcoming the computational complexities to manage the process and procedures on the
large datasets. The data can be visualized at scalable format to preserve the information
security. The exponential growth of the data and information depends upon the assemblage of
mobile and remote sensing devices. The data is generated from radio frequency identification
readers placed in the IOT applications. The accuracy and accessibility of the information at
required time is the prerequisite for effective handling of IOT devices and control.
Computational complexities and knowledge management program: The knowledge
management program helps in minimizing the complexities related to the transmission of
information from one source to another in the minimal period of time by managing the data in
a semi-structured and structured manner. The authentication and control program helps in
preserving the principal component of the data transmission program. The scalability helps in
increasing the communication between different data sources and application. The real life
problems can be resolved by applying the hybridization techniques. The datasets should be
arranged in the sequential format so that meaningful information can be retrieved on-time.
The inconsistency associated with the voluminous data is the major problem in relation to the
management of big data in a systematic manner. The mathematical approach should be
implemented for managing the complexities associated with the big data. The uncertainty
associated with the large datasets
25

Big Data Modelling
Big Data modelling progress and research trends
The huge progress is seen in big data modelling due to the invention of cloud computing
technology, Internet of things, mobile networking, and Android smart phones. The data
warehousing and data mining technologies play a vital role in managing the data in the
structured format. The innovations of software define network helps in the transmission of
big data in the cloud computing environment. The traditional database technologies are
ineffective for managing the voluminous data. The privacy and confidentiality of the
information is not secured enough from data leakages. The trillions of big data require an
innovative technology for handling the large data set in the required format. The efficiency of
data retrieval can be improved by synchronizing the data in small chunks and fragments. The
mobility of data can be controlled by installing and implementing the map reduces
technology. The consumption and accessing of the data can be improved by inducing the map
reduce technology. The acquisition and transmission of the structured, semi-structured, and
unstructured data can be effectively done by applying the map reduce technology. The
following diagram shows the application of the map reduce technology for managing the data
produced from social networking, clicking of the websites, mobile network, IOT applications,
and others.
26

Big Data Modelling
The progress of big data is seen in the areas of cloud computing, data management and
engineering, and sensor data. The new opportunities have been seen in virtualization
planning. The big data methodologies are created for storing the data in a structured format.
The software defined network is the most promising technology which is used for applying
the map reduce concept. The RCF files are used for controlling the access utilization of
space. The remarkable innovation has been seen in the management of data in cloud
computing environment. The SDN technology makes use of rapid miner and Hadoop
(Radoop) file structure for systematic arrangement of data in the database. The technologies
which are used for managing the big data in recent past years are discussed below in details:
Data warehouse and storage unit: The storage of big data is basically done in the relational
database by making use of NoSQL. The handling of the relational model is statistically done
through the map reduce algorithm for systematically arranging the data in column format,
27

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Big Data Modelling
row format, or hybrid format. The record columnar file (RCFile) is used for inducing the map
reduce technology for storing the big data in the structured format. The organization of the
data in the Hadoop file structured helps in retrieving the data in less response time. It make
query processing easy. The dynamic loading of required data can be effectively done by
using the RC file. The following diagram shows the infrastructure of data organization by
using the RCFile in the systematic architecture.
The above diagram shows the cluster organization of data in Hadoop distributed file
structure. The Run Length encoding algorithm is used for storing the data in the columnar
format. The Decompression techniques are used for improving the processing speed in the
row and column data.
Software Defined Network: The transmission of big data in the cloud environment can be
effectively done through the software defined network. The data can be transmitted over wide
area network by developing the hotspots in the campus network. The scalability in the
transmission of data packets can be optimized by placing the SDN controller (Thillaieswari,
2017). The switching technology is implemented for improving the network control and
performance. The delays in the packet transfer can be reduced by the installation of the SDN
controller. The run time processing of data can be improved by using the direct connectivity
28

Big Data Modelling
of servers. The following diagram shows the formulation of managing the big data by using
the software defined technology in an effective manner.
The virtualization of the big data can be done by using the programmable component of
software defined network. The synchronous approach is used for collecting the data from
different platform. The machine learning algorithm is used for reducing the processing time
in dealing with big data of cloud environment (Tykheev, 2016). The virtual machines are
placed for rebooting the architecture and managing the traffic congestion by migrating the
data packets to other path.
Data Analytics: The process of analysis is the most crucial in applying on big data. The
processing capability of big data management can be improved by implementing the Hadoop
file structure and map-reduce technology. The organization of the physical data is done on
the basis of data index and layouts. The Starfish is the most commonly used channel for
managing the process of big data analytics in Hadoop file structure. The Starfish process
helps in improving the performance by minimizing the configuration of the data stored in the
database through the mapping process (Bhosale, and Gadekar, 2014). The integration of rapid
miner Hadoop file system (Radoop) with the data analytics techniques results in balancing
the workload between the different units of the database. The scalability and feasibility of the
data can be effectively achieved. The data volume and data transmission process can be
29

Big Data Modelling
managed and balanced properly in the Radoop architecture which in turn results in balancing
the network congestion. The figure below shows the Radoop architecture in managing the big
data.
30

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Big Data Modelling
In the Radoop architecture, the Rapid miner gets interconnected with the Hadoop file
structure system for managing the data collected from different platform (Silva, and et. al.,
31

Big Data Modelling
2018). The installation of the rapid miner helps to perform all the operation in parallel to the
HDFS for automation in the data analytics process. The Visualization of the data can be
easily carried out with no learning of program codes. The setup of the Rapid miner is easy to
manage. It helps in accelerating the process of big data analytics by providing visual and
graphics environment. The transmission of the process can be done within the fraction of
second between the sender and receiver by managing the confidentiality and sensitivity of the
information (Sin and Muthu, 2015). It is effective in handling the sensor data used in IOT
application. The data accessing and management can be effectively done with the Rapid
miner Hadoop file structure. The configurations of the graphical charts are comprised of line,
box, histogram, bar charts, pie charts, and others for representing the data in the
understandable format. The installation of the WEKA tools and SAS can be done with
minimum cost. The operator framework is powerful enough for handling the arbitrary process
also.
Big Data in cloud computing environment
The tremendous progresses have been seen in the terabytes of data generated from the cloud
computing technologies and environment. The resource management program in cloud
computing is the biggest challenge in managing between different user at a time
(Senthikumar, and et.al., 2018). The resource distribution among the member participant is
the core step in application which generates big data. The controlling of the multi-cluster
cloud should be effectively done by using the virtualization planning.
Big Data Models and Its application
The handling of the big data can be done by using the most promising data model for storage,
analytics, and processing of the data. The clustering of data set is required for handling large
data sets. The scalability of the data can be achieved by implementing the big data models for
storing the data in a structured format (Koman, and Kundrikova, 2016). The programmability
of the data can be done by implementing the mechanism of indexing and key values. The
systematic organization of the big data can be effectively done in the following format which
can be depicted in the diagram below:
32

Big Data Modelling
The spectrum of the big data can be managed in block based, file based, and object based
organization which are discussed in detail below:
Block Storage organization of Data: The classification of data is organized in the block
structure for improving the data retrieval rate and performance. The installation of the fibre
channel is based on meta-data synchronization (Mo, and Li, 2015). The object based
transaction can be effectively organized for systematic retrieval of required information. The
figure below shows the architecture of block based data storage model.
33

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Big Data Modelling
The Data is stored in blocks or chunks of specified size specification. The unique identifiers
and Index value is used for fetching the required information from the data blocks. The
accessing of the information can be done through the protocols (Zaino, 2016). The mapping
of the block IDs is done for getting response of the query generated by the user. The block
nodes are created which are of fixed size for storing the data, final result, and metadata.
Amazon Elasticity Block Storage model: The instances and objects ae created for storing the
data in the storage area network architecture. The external devices can be used for storing the
data for back up support. The data retrieved from amazon web service is basically stored in
the Amazon Elasticity Block Storage model which makes use of software defined network
(Wu, Sakr, Zhu, 2017). The searching of the file in the cloud environment is done by using
the EBS data storage unit. The virtual machines are installed for overcoming the problem of
data loss and data confidentiality loss. The failures of the data retrieval can be minimized.
34

Big Data Modelling
Open Stack Nova Storage Model:
The Nova Working structure provides the open stack based storage model for preserving the
large data set in the specified order. The replication and snapshots of the data are handled by
the “Cinder” (Alam, and et.al., 2014). The Cinder technology is used for managing and
creating the large data chunks for optimizing the data retrieval process. The restoring of the
data helps in developing large data volume.
File Data Storage Model:
In file based storage model, hierarchical structures of files are developed. The
implementation of abstraction layer and object based model helps in achieving the scalability
and performance optimization. The organization of the files is carried out in physical storage.
The collections of data nodes are required for managing the distributed file structure. The
fault tolerance capability of finding the files in the hierarchical structure can be optimized.
The transparency and automation in the file structure can be achieved.
Hadoop Distributed File structure
The Hadoop distributed file system is used for storing the data at the primary level by
maintaining the name node and data node of the application. The clustering of the files is
effectively done for optimizing the performance of accessing the required data and
information. It helps in achieving the scalability. The reliability of managing the big data can
be improved by implementing the map reduce technologies in distributed file system. The
transfer of data and information among the different nodes can be effectively done. The
information is broken down in various blocks. The efficiency of accessing the data can be
increased by applying the concept of parallel processing system. The replicated copies of files
are stored at distributed servers for managing the data retrieval process. The HDFS
architecture is based on master slave format. The clustering of the files is done through the
name node. The input-output retrieval of information can be maximized by implementing the
general parallel file system. The figure below shows the architecture of the Hadoop
35

Big Data Modelling
distributed file system.
The description of HDFS architecture is summarised below:
Name node: The hierarchy of the file is organized in the Hadoop file structure by
synchronizing the name node in sequential order. The files are stored under the directories.
The modifications in the records can be done by providing the permission of accessing the
node. The namespace tree is developed by managing the name node. The mapping of the data
nodes is carried out for retrieving the required information from the selected nodes. The
clustering of the name nodes is used for executing the various applications simultaneously.
The checkpoints are organized in the name node for restarting the applications from the
selected storage directories.
Image organization: The metadata are organized in the block which is called as Image. The
images are stored in the RAM. The checkpoints are used for managing the file system by
identifying the name node of the records. The clients send the acknowledgement of receiving
the data from the administrator. The multithreading functions are applied for managing the
relationship and sending of data nodes between the multiple clients. The directories are used
for storing the records in the sequential order for serving the clients. The processing of the
data and information can be optimized by applying the flushing and synchronising function
on the records based on the name node. The transactions of the process can be checked by
application the flushing and synchronising function.
36

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Big Data Modelling
Data Nodes: The replicas of the blocks are represented by the data nodes on the local file
system. The checksums are used for managing the data in the first block and metadata of the
records in the second blocks. The time stamps are generated for fetching the data. The
performance of data transmission can be optimized by applying the handshake protocol in the
architecture of the data nodes. The namespace ID is used for uniquely identifies the data node
in the available cluster organized on the Hadoop distributed file structure. The handshake
protocol is used for registering the name node on the data node. The replicas of the blocks are
uniquely identifies on the distributed servers by using the namespace ID. The transfer of the
information is done through scheduling the allocation of the blocks on the basis of name
space IT. The request of retrieving the information is directly sent to the data node rather than
to the name node. The data nodes give commands to the replicas of data for accessing the
required information from the record. The integrity of the system can be effectively managed
by applying the name node operations.
Client: The client is provided with the interface linked with the HDFS library for performing
the operations like read, write, and deleting the files from the directories. The metadata
replicas are stored at the distributed servers at remote locations. The required data nodes can
be effectively retrieved by managing the peer to peer transmission of data. The data is
searched by indexing the data through name node. The block of the data can be retrieved
from the distributed servers by undergoing the iteration process which can be illustrated from
the figure given below. The figure shows the user functioning for creating the new file on
Hadoop distributed file system.
Checkpoints: The HDFS file structure is depend on the checkpoints for accessing the desired
data from remote location. The request of the client is checked creating the checkpoints nodes
37

Big Data Modelling
for indexing the data with the help of name node. The downloading and uploading of the files
can be done by searching for the namespace in the HDFS. The searching of the data
automatically starts with the checkpoint if the failure of the name node occurs.
Backup nodes: The back nodes are used for creating the periodic check points for the request
placed by the user. The transaction can be speed up by searching the data through the active
name node. The data is extracted from the back up node about the active name node. The
checkpoints are not downloaded by the backup nodes for creating the latest record in the
namespace of the memory. The read only operations can be performed on the checkpoints
node. The updated image is created on the namespace for synchronizing the name node in a
systematic order. The clustering of the nodes is carried out on the basis of name node.
Snapshots of the Upgraded file system: The probability of bugs and Obscure occurs in the file
can be increased with the intervention of human participation. The current state of the file can
be upgraded by the roll backing the process up to checkpoints. The clustering of the file
snapshots helps in finding and searching the file from the replicas of data stored at the
distributed server.
Read and Write operation on the file: The append operation can be performed for adding new
information to the existing file. The read operation is performed for reading the data of the
selected record. The write operation is used for creating the new file at the distributed server.
The writing of the data is taken place between the soft limit and hard limit of time. The
creation of the block takes place by providing the unique ID to the block. The replicas of the
host are used for managing the data nodes of the client to form a pipeline. The packets are
pushed in the pipeline for minimizing the distance between them to accelerate the
transmission rate of the packets between the sender and the receiver. The diagram given
below shows the pipelining structure for writing new data to the data block.
38

Big Data Modelling
The map reduce technique is applied on the clusters of batching processing units for writing
and reading the data from the block. The sequence of data nodes can be tested from the
checksum points.
Synchronization of the block:
The synchronization of the blocks depends on the flat topology for accessing the multiple
data. The distance between the data nodes is used for determining the network bandwidth.
The read and write operations are used for determining the configuration of the applications.
The cost of the name node can be minimized by sending the packets in the pipeline. The
location of the host can be checked by finding the aggregation of the distributed cluster of
blocks.
The architecture of the HDFS is composed of data nodes and name nodes which is used for
managing the file on the distributed server. The creation of the block is composed of read and
writes operations.
39

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Big Data Modelling
The replication of the data takes place to form a large cluster on the machines. The blocks of
the same size are sequentially stored on the database to minimize the fault tolerance in
fetching the data from the file. The configuration of the file depends on the replication
factors. The data nodes are stored in the block for minimizing the replacement of the
replicated data. The TCP/IP protocol is used for managing the communication between the
data nodes and name nodes for balancing the cluster, maintaining its integrity, and
minimizing the failure of the metadata. The rebalancing of the data nodes and replicated data
helps in managing the rebalancing of the clusters. The following diagram shows the list of
data nodes contained in the data node.
40

Big Data Modelling
The HDFS is effective for managing the big data in the structured format for increasing the
processing on the data stored in the data nodes. The interaction between the data nodes can be
improved by implementing the command interface between them. The cluster status can be
checked by integrating the balance between the data nodes and name nodes. The accessing of
the data from the file system can be retrieved by applying the streaming process. The faults in
the data and information can be identified by implementing the Hadoop file structure for
storing the big data of the applications.
Network File System
The network file system is used for managing the distributed file on the client and server
machine. The remote procedure call is used for implementing communication between the
files and directories. The transparency in the functioning of the client and server side can be
accelerated by applying remote file system on the kernel. The booting time of the client
machine can be minimized by minimizing the consumption of data by the single host
machine. The integration of the virtual file system with the kernel helps in minimizing the
41

Big Data Modelling
booting time of the process. The diagram below shows the NFS architecture for managing the
data between the clients and server machine.
The functioning of the kernel level can be optimized by implementing the network file
system. The Block ID is provided for operating read and write operation on the data and
information according to the query processed by the user. The datagrams of the information
can be managed by sequencing the flow of packet controls with the help of user datagram
protocol (UDP). The operations which are performed on the NFS are categorised as create,
rename, link, open, Rmdir, read, write, and others. The multiple operations can be performed
by applying the open network computing (ONC) remote procedure call. The mount protocol
and UDP protocol are used for accessing the files from remote locations. The naming process
is used for securing the information on the network from unauthorised accessing of data. The
files are stored under four attributes such as Type, Size, Change, and unique identifier of the
file. The sharing of the files on the network is done by managing semantic session. The
virtual file system is used for handling the remote files. The accessing of the file can be
42

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Big Data Modelling
achieved by applying the open operation. The caching process is used for processing the
request of the user.
Object Based Data model
The object based data models are used for storing the data related to the real life entities. The
relationship and mapping between the entities can be effectively done developing the object
model. The isolation of one objects with the other objects can be completed by involving the
mapping process. For example, the employee database can be developed for the organization.
The objects are the collection of meta-data and data (Beakta, 2015). Every object is equipped
with name value pair which describes the characteristics and attributes of the undertaken
entities. The content type of the data is used modifying and storing the data in the specified
format. The unique identifier are used of uniquely identifies the data and record in the
database.
Advantages:
The implementation of object based storage model helps in implementing the inheritance
property for maintaining the characteristics and attributes of the different objects in a
structured format. The flexibility and changes can be achieved (Tan, and Chen. 2014). The
reusing of the program code and data can be carried out by using the inheritance property.
Disadvantages:
The object based storage model is not accepted by user. It is not based on particular
technology.
Any type of data is stored in the object oriented database such as images, records, and
multimedia. The abstraction layer helps in mapping the data by assigning the mapping
process. The data is logically organized in the systematic order. The user objects are created
for managing the metadata for accessing the required information from the database. The
sharing of the data across the database can be done by initiating the metadata server. The data
can be retrieved in the sequential order. The performance of the object based storage model is
high because it allows the direct accessing of the data by analysing the metadata. The security
is the major concern with the OSD model because the data is stored in the logical level (Silva,
2018). The data access is given to the user by authorizing the data path retrieval process. The
Sub-objects identifiers are allocated to identify the correlation between the physical and
43

Big Data Modelling
logical objects. The offloading functioning is provided to the components for implementing
the transaction process effectively. The metadata is synchronized of enabling the sharing of
objects in the parallel processing units. The metadata is used for providing the uniformity to
the data collected in the database (Koseleva, and Ropaite, 2017). The persistence of the
objects can be uniquely identified by navigating the temporary objects in the database. The
diagram below shows the architecture of the object based storage model:
In the diagram above, the architecture shows the OSD model is divided into three layers
which are classified as logical data modelling, physical data modelling, and creating and
updating data. The process of the data can be done by managing the process models, data
requirement, technical requirements, performance requirements, and organized business data.
The required data can be retrieved from the physical and logical database.
Simple Storage Service Amazon Data model
The Amazon simple storage service model is the interface which is used for storing and
retrieving data from the internet. The amazon web services can be used by the user after
getting the authorised access control through the authentication process. This model is based
on the storage of objects for managing the files and records on the clouds. The ID number is
allocated to the meta-data for easy retrieval of information. The block of the data can be
retrieved by the using the application program interface. The data in gigabytes and terabytes
44

Big Data Modelling
can be effectively stored in this model. The storing capacity and durability of the objects
stored in this model is about 99.999999%. The information is securely transmitted between
the sender and receiver. The frequency of accessing the data is very high. The data and
information is distributed among the dynamic websites. The backup support is available for
overcoming the disaster recovery. The availability of the data on the internet is about
99.555%. The web hosting infrastructure in implemented for managing the accessibility of
the objects, data, and information among the user. The static HTML websites can be easily
hosted on the amazon S3 model for storing the data conveniently for the application.
Advantages:
The amazon S3 model is used for creating buckets for storing data and information in the
buckets. The buckets act as a container for managing the data in a systematic order. The
bucket can consume data in terabytes. The unique key is allocated to the objects for
identifying the data uniquely in the buckets. The downloading and uploading of the data at
the internet can be effectively done.
Objects: The standard interface is used for managing the tool kit for accessing the required
data. The fundamental entities of the amazon s3 model are known as objects. The objects are
the collection of meta-data and data. Every object is equipped with name value pair which
describes the characteristics and attributes of the undertaken entities. The content type of the
data is used modifying and storing the data in the specified format. The unique identifier are
used of uniquely identifies the data and record in the database. The objects are composed of
two parts keys and versioning. The keys are used to specify the value of the data in the
bucket. Every bucket is equipped with one unique key value pair. The retrieval of the
information and data depends on three things bucket, keys, and version.
Regions: The location is specified for the particular data in the record which helps in
minimizing the latency period and cost for managing the specified requirement of the user.
The read and write operation can be effectively operated on the data by getting the access
control through the process of authorization. The atomic values are allocated for successful
retrieval of the desired information from the cluster of data records. The propagation and
writing of the data can be done in the bucket itself. The processing of the operations
performed in the amazon S3 model is described in the following format.
45

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Big Data Modelling
This model is best suitable for managing and storing the tremendous data generated from the
web services of the internet. The cost of managing the Amazon S3 model is considerable low
to other models (Cognizant, 2014). It provides the high scalability and durability for storing
the information securely over the internet.
The architecture of the amazon S3 model provides security because the data is stored in the
encrypted form. The encryption of the information takes place in two different styles which
are named as client side encryption process and server side encryption process. The data can
be regenerated from its corrupted form. The retrieval of the information can be actively takes
place by adding the versioning process. The durability of the information can be achieved by
adding the checksums to the replicated data. It helps in managing the network congestion by
transmitting the right packet to the right destination. The data is stored virtually in amazon S3
model. The data is organized on the basis of Key. The data which is required and accessed by
the user regularly is stored under the class of frequent data access of Amazon S3 model. It
helps in generating the instant response to the request. The data which is not required and
accessed by the user regularly is stored under the class of infrequent data access of Amazon
S3 model. It helps in saving the time and cost of the user. The data is organized in amazon S3
model in bucket architecture. The transferring of data takes place in two ways one is named
as transfer acceleration and cloud front caching service and other is snowball transfer process.
46

Big Data Modelling
In the transfer acceleration, the data is transferred by using the cloud front caching service. It
helps in accelerating the faster and secure access of data. The cloud front amazon caching
service transfers the data to its nearest neighbour for optimizing the flow of information. The
physical transferring of data takes place through the snowball process. In this process, data is
transferred from one location to other in large batches of information. The latency rate of data
transfer is low with high throughput. It is less expensive storage technique for managing the
large and tremendous data generated from the web services and internet applications.
Open Stack Swift Data Model
Open stack swift data model is the open source and cost effective data storage technique used
for managing the big data effectively in clusters. This model plays a vital role in the
development of the cloud service model for managing the networking and computational
components. The swift software is freely available by getting the license of Apache 2.0. It
provides the backup support to the static data of the internet. It can store emails, audio and
video files, and virtual images, and many others. The data is stored in the binary structure
format. The attributes of the objects are related with each other through the metadata. The
architecture of the OSSD model is based on proxy server and storage nodes (Wang, and et.al.,
2018). The transmissions of data for read and write operation takes place by using the HTTP
protocol. The Put () and Get () functions are used for storing and accessing the metadata. The
data is stored in the swift cluster in binary format. The hash tags are used for finding the
location of the data by using the proxy server. The replicated copies of data is stored in under
storage nodes (Acharja, and Kauser, 2016). The diagram given below shows the architecture
of open stack swift data model.
47

Big Data Modelling
In this diagram, we can see that the objects are stored in the horizontal format for managing
the petabytes of data. The servers are partition into open stack. The mapping process is
implemented for retrieving the information from the data storage units. The partitions of data
are assigned to the storage units. The real time data can be collected from the proxy server.
The data can be represented in block storage and object store.
Block Storage: The block storage is also named as Cinder. It helps in enabling the persistence
of the virtual machine. The performance of the block can be maximized by integrating the
block under the storage services. The application programming interface helps in managing
the volume of data on the virtual machine. The performance of the data retrieval can be
maximized by integrating the block storage. The extraction of the data can be easily extracted
from the block storage.
Object Store: Object store is capable of managing the open stack. The implementation of
swift functions is used for accessing the data from the virtual machines. The data is stored in
48

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Big Data Modelling
unencrypted format. The key value pairs are allocated to the data and blocks for managing the
integrity and security.
In swift data storage model the consistency of the data can be effectively managed. The
queries of the user are responded with the updated information. The multiple copies of the
replicated data is stored in different location so that the user can be provided with the instant
information. The data can be retrieved from another node if the failure of the data node
occurs. It helps in accelerating the performance of the storage unit. It is based on the
distributed storage system which improves the scalability and availability of the information.
The efficiency gap of retrieving the information can be filled by implementing the swift
storage system for managing the terabytes of data.
Characteristics of the Swift model:
The open storage of the data can be effectively manageable in the swift model. It is based on
open source architecture for freely available of data. The clustering of the data can be done
easily. It can act as a standalone data storage unit for storing the big data generated from the
cloud computing environment. The Linux operating system is the most suitable operating
system. The consistency of the data can be retrieved by keeping the information in the
unstructured format. The URL is allocated to every information of the swift storage unit. The
metadata is used as index for finding the data on the database. The HTTP API interface is
used for providing communication linking between the sender and receiver. The data is stored
in different location in replicated structure so that the availability of data can be timely
provided to the user. It is the cost effective technique by adding additional data nodes for
storing the big data. The information of the cluster is upgraded periodically. The run time of
the information can be minimized by enabling the swapping of the data effectively among the
data nodes and the name nodes. In the same way, unused information can be deleted from the
database to make the memory space for the new information. The Swift request is handled by
using the HTTP protocol for providing the authentication to the information and data by
allocating the unique URL to it. The metadata is used as a index for searching the required
information on the database. The URL address is comprised of two parts first part focuses on
the location of the cluster and second part gives the information about the open stack where
the data is actually stored. The HTTP reveals the information that the request has been placed
by the user. The authentication function provides the URL of the storage unit and the storage
URL connects with the user account for retrieving the information uniquely from the
49

Big Data Modelling
database. The diagram below shows the systematic diagram of retrieving information in the
open stack swift model:
The operations which are performed on the swift model are presented in the table below:
Function Description
Get () The Get () function is used for downloading the list of object
from the container or account of the user.
Put () The Put () function is used for creating new containers for
uploading new objects in the stack by replacing the metadata
headers with the new information for accelerating the searching
of the required information in the database.
Post () The Post () operation is used for updating the metadata in the
user account.
Delete () The delete () function is performed by creating the empty stack
for storing the new information
Head () The head function is used for accessing the information and data
stored in the header of the metadata.
50

Big Data Modelling
Swift Processes: The swift cluster make use of four processes for managing the account
information of the swift server. The processes which are used for tracking the information are
named as proxy process, account process, object process, and container process. The proxy
nod of the swift server is run by using the proxy process. The storage nodes are the collection
of various process in the single unit. The Put () function is used for sending request to the
new account for operating the server processes. The consistency of the data and information
can be balanced by maintaining the information of the storage unit. The diagram below
shows the pictorial representation of the storage node:
The table below explains the process layers of the swift open stack model.
Name of the Layer Description
Proxy Layer The proxy server is used for managing the interface with the
external clients so that information can be effectively shared
between them. The handling of the user request can be
51

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Big Data Modelling
accelerated by incorporating the API in the proxy server to
generate the request response code. There are two proxy server
which are employed in the architecture for overtaking the
functioning if the failure in on proxy server occurs.
Account Layer The Account layer processes are used for handling the metadata
in the sequential order according to the list of the user account.
The index should be prepared on the basis of information stored
in the header of the metadata which can be used as a key value for
searching the required information in the database. The account
server is responsible for generating the link of the URL of the
cluster storage
Container Layer The information stored in each account can be accessed through
the metadata. The request is sent to the container layer for
fetching the desired information from the stack (Kim, and et.al.,
2016).
Object Layer This server keeps track of all the objects for providing relevant
information to the user according to their request. It keeps track
of all objects for managing the proper communication between
sender and receiver. The extraction of the information takes place
on the basis of time stamp. The timestamp is used for generating
multiple version of the required object. The extension of the
attributes is done by the object server for keeping the records in
the single unit (Brand, 2014).
NOSQL
NoSQL Database model is the most frequently used data model for managing the big data of
the web services. The voluminous data can be effectively stored in it because it support
flexible schema and works on managing the data in semi-structured and unstructured format.
It does not depends on the schematic design of the database for balancing the terabytes of
data (Choi, and et.al., 2014). There are four different data models which are supported by the
52

Big Data Modelling
NoSQL architecture which are named as Key-Value Stores, Document Store, Column family
database, and Graph Database. Each of the database models are explained below:
Key Value Data stores:
Each and every information stored in the database are provided with unique key value. The
key value acts as a pointer for retrieving the relevant information from the data stores. The
values are sequenced in the scalar datatypes. The Get () and Put () Functions are used for
storing the new value in the data stores. The reading and writing operations can be performed
at highly accelerated speed and velocity. It is effective model for managing the session period
between the sender and the receiver for fetching the required information in fixed time stamp
from the data store. The data caching process can be easily implemented by managing the
configuration between the information. Large memory space are created for storing the
multimedia files because it supports the variable length memory chunks. The complexities of
the queries can be solved by providing specific data values for representing the relationship
between the data and keys.
Document Store: In document store the data is stored in the XML format, therefore, it is
called as a document. The object and metadata are represented by binary values. The index is
provided for finding the data on the database. The schema of the document store is flexible
for storing the data in the specified format by implementing the de-normalization technique.
The diagram below shows the embedded document stored in the document store.
53

Big Data Modelling
The use cases can be developed for representing the content management system,
development of the ecommerce websites for trading, and carrying out complex transaction.
The multiple operations can be effectively performed on the document store database.
Column Family database: This database is the systematic organization of the data families for
managing the sorted mapping of the data. The timestamping process is implemented for
getting the required information from the database. The voluminous data can be effectively
manageable and making high availability of the data. Time series format is used for
organizing the data in the column structure format (Cao, 2014). The diagram below shows the
representation of the column data store of managing the large volume of data in a systematic
format:
The aggregation functions are created for finding the response of the adhoc query posted by
the user.
Graph Database: The nodes are used for representing the data stores. The information from
the data store is fetched by representing the relationship between the data nodes. The
54

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Big Data Modelling
attributes of the data are called as properties. The relationship can be represented as
unidirectional and bi-directional format. The native graphs are developed for representing the
relationship between the data stores. The relationship format is designed in subject predicate
object structure.
The visualization of the data can be performed through various technologies such as
Dygraphs, Zing Charts, Polymaps, Timeline, Exhibit, Modest Maps, Leaflet, Visual.ly,
Visualize Free, jQuery Visualize, JqPlot, Many Eyes, Jpgraph, High charts, and others. The
Dygraph tool is used for representing the versatile intensity of the data sets. The compatibility
of the data processes should be identified for measuring the errors in sending of the data
packets from on source to another. The zin charts are used for showing the data present in the
warehouse in the form of charts. The APIs are developed for getting required information of
the data available in the warehouse. The Polymaps are developed for showing the scalable
and vector graphs of the image process models. The timeline charts are prepared for
sequencing the user image in compressed data formats. The big picture can be retrieved from
the timeline to demonstrate the information in an effective manner. The Modest maps are the
visualized tools which is based on designing codes for analysing the compatibility and testing
of the real time data collected from various sensor devices. The efficiency of the data sets can
be effectively stored in the interactive data model for improving the performance of the
animation programming model. The input output processing can be optimized by enabling
the cascading style sheet. The leaflet structure is developed for sequencing the image of the
datasets from the warehouse model. The visualize free is the open source model which is
used for uploading the interactive image of the defined data sets. The multidimensional view
of the data sets can be generated by developing the jQuery according to the user demand. The
visualization of the big data helps in increasing the data accelerated and retrieval process. The
data presentation and understanding can be effectively improved. The quality of the data
depends upon the displaying of the outputs. The handling procedures should be developed for
preserving the data in the large data chunks.
It is difficult to handle the processing and storage capability of the data sets. The sharing of
the information helps in obtaining the data from different servers to get interactive results of
the user query. The retrieval of the information is restricted for getting meaningful
information from various sources. The synchronised data should be collected for storing the
data in large datasets. The required information can be extracted for getting combined effect
of the real time sensor devices informatio
55

Big Data Modelling
Chapter 3: Research Methodology
The qualitative analysis of the big data management system helps in giving direction to our
research. It helps in finding the affordable big data model which is suitable for managing the
limitation of the traditional models used for managing the data in the warehouse. The study of
the literature review focuses on finding the gaps in the research and developing effective
model which overcomes all the obscure of the big data management process. The study of the
evolution in the architecture and operational program of the electronic data processing system
helps in analysing the research gaps in manipulating and modifying the big data according to
the user request. The motivation of our research is the literature survey. The qualitative mode
of research results in developing the real time analysis and management of the big data with
respect to the inventory management system. The upcoming model should provide the
elasticity for managing the interfaces between the open data. The invention of NoSQL
databases helps in arranging the data without a relational schema. The security is the major
problems related to big data management system which has to be resolved with the proposed
data model. The complete architecture depends upon the placement of data analytical engine.
The interoperations of the processes can be effectively manageable by integrating the system
design with the sensor data.
The baseline study of the organizations is carried out for getting the knowledge of problems
and challenges faced in the management of the inventory.
Sampling: The study of the population helps in analysing the viewpoints of the member
participants to give the idea about the challenges faced in the development of the inventory
control parameters.
Questionnaire: The questionnaire was arranged with the employees to keep track of the
problems and difficulties faced with the traditional inventory management system. They
focus on the security and privacy issues associated with the inventory control system. The
designing of the questions should be done to retrieve vast information which give the clear
concise view of the inventory system and the proposed solution to overcome the situation.
Interview: The interview was arranged with the project manager control to give the
knowledge about the problems faced by him in getting information about the inventory
underflow and overflow condition in the warehouse. He is not able to get intimate signal for
ordering commodity to the supplier on time. It creates the complex scenario in managing the
56

Big Data Modelling
requirement of the customers because it limits the supply and demand synchronization in an
effective manner.
Focused Group: The meeting was arranged with the group of experts to give their ideas and
observation on the inventory management problems. The statistical report is prepared about
the opinion and view of the member participants on the advantages and problems of the
traditional model used for managing the big data of the inventory management system.
Data Analysis:
The systematic view of the information and data can be done by developing the graphs of the
opinion and views in relation to the inventory management system. The tables are prepared
for identifying the positive responses and negative responses of the member participants.
Literature Review: The study of the literature helps in finding the work and inventions done
in managing the big data on the cloud environment. The new innovative methods are
developed by expanding the traditional architectural view for managing the big data
information in systematic format. The study of the related work in the management cycle
program of the big data management system opens the door for the research in developing the
expanded version of the inventory control system by analysing the presence of the
commodity in the warehouse. The historical data can be collected by developing the general
algorithm to manage the cluster of jobs. The mapping of the tasks helps in reducing the
pressure on the data server. The launching and finishing time of the data model helps in
reducing the flow of tasks. The tracing of the files can be completed to predict the processes
for reading and writing of bytes on the data files. The forecasting model helps in increasing
the inventory cost for developing the time series of the data format (Zou, and et. al., 2013).
The scheduling of the data nodes help in managing the data in heterogeneous format for
comparing the formatting of the map reduce function. The hash tables are developed for
handling the remote procedure call to define the scheduling of the data node and name node.
The heterogeneous data is used for traditionally mapping the data through the application of
map reduce function.
Clustering Method: The clustering method is used for managing the data in temporary files.
The data is randomly selected from the database for developing the index value chart. The
reading and writing operation helps in keeping the files in systematic order. The index is
prepared for finding and searching the data from the Hadoop clusters.
57

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Big Data Modelling
The big data analytics are used for analysing the practical application of the inventory
management system. The systematic study of the processes used for developing the big data
model is shown in table below:
Processes Data source Opportunities Challenges
Planning None required Determining the risks
associated with the
arrangement and
management of the
big data
Synchronising the IT
resources used for
managing the
inventory in the
warehouse
None Required Reducing the
external control on
the business process
None
None Required Improving the
managerial functions
None
Inventory supply None Required Reducing the
capacity of the data
storage
Nill
Smart Devices Managing the value
added network of the
supply chain
Nil
RFID reader
GPS location tracker
Intelligence system
for tracking the
availability of the
inventory in the
warehouse and
location of the
commodity
Nil
Production GPS Location tracker Intelligence system
for tracking the
availability of the
inventory in the
warehouse and
location of the
Nil
58

Big Data Modelling
commodity
Positioning Vector Sensing the
information from the
large cluster
Identifying the
connectivity in the
material
Nil
RFID Reader Identifying the
aggregated pattern of
inventory
management system
Implementation of
the Data transparency
Real time tracking
for generating quick
response
Cross Functional
view of the data
complexity
Distributed Data Linking with LAN
network
Tracking the
inventory in optimal
route
Developing the
address verification
and validation
process
Managing the sales
of commodity
LAN Network Improving tracking
process for inventory
control
Logistic program of
the company
RFID reader Maximizing the
inventory control
management process
Nil
None Maximizing the
logistic program
Initiatives of the data
management
program
Reducing cost
efficiency
Determining the data
scheduling program
Inventory Return GPS controller Planning the Nil
59

Big Data Modelling
relationship with the
government and
other external parties
Positioning vector Feasibility of the
customer data and
inventory
information
Nil
RFID Reader Sensing the real time
information of the
inventory
Nil
The development of the big data management model focuses on the information retrieval of
the sensor devices. The decision making process depends on the valuable information
extracted from the big data model. The static view of the information is helpful for
visualizing the presence of the data and information in a systematic order. The data models
are influenced by describing the physical representation of the data in tables and column. The
accountability of the physical design is used for reflecting the design model. The focus
should be given on schema designing, storage pattern designing, Data latency period, data
processing, and other. The details of the physical design are represented in the table below:
Designing Parameters Descriptions
Schema Design The design of the schema is used for defining
the physical storage of the data. The relation
between the entities should be developed for
identifying the data structure used for storing
the data. The focus should be given on
storing the sensor data information. The
Hadoop model is used for storing the real
time information because it is schema less
storage model which is capable of storing the
dynamic information in a sequential format.
Storage Design The focus should be given on identifying the
limits of the storage size of the data. The
visualization of the data can be manageable
60

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Big Data Modelling
by implementing the normalization
technique. The storage capacity of the
Hadoop model is more than the RDBMS.
The redundancy of the data provides
flexibility to the working model of the data
storage units. The complexity of the storage
unit can be minimized by synchronizing the
large data sets.
Ownership Management policies The user is provided with the authorized
control for storing the data in the Hadoop
files securely over the network by providing
the authentication control in managing the
flow of information. The high level schema is
developed for managing the relationship with
the external parties and organization. The
systematic approach should be followed for
defining the hierarchical data structure to
store the artefact of sensor data in sequential
formats.
Access control policies The model is designed for providing access
control to the end user for retrieving the
relevant information according to the query
posted on the portal instantaneously. The
information is access frequently for
managing the concurrency control on the data
units. The aggregated functions are
performed for storing the data sequentially on
the Hadoop file structure. The information is
transmitted for accessing the control on the
data blocks. The data can be directly
accessed from the data structure. It helps in
accelerating the business functions and data
units.
Data Latency period The anticipation of the designing facts should
61

Big Data Modelling
be deployed for managing the data latency
period. The stream of data is organized for
scheduling the real time sensor data in
structured format. The periodic updates of
the batch processing units can be influenced
by the normalization process. The de-
normalization process helps in reducing the
correlation between the aggregated data. The
systematic approach should be followed for
organizing the data in batches to develop the
stream of processes and query responses. The
latency period is minimized for sequencing
the data in particular order so that the
cleansing and normalization functions can be
performed easily on the stored data. The
aggregated functions are used for managing
the information governance and control.
Data processing The processing of the data is influenced by
the read and write operations performed on
the data collected from the sensor data to
give the real time scenario. The analytical
approach should be laid down for managing
the flexibility to the data nodes.
Applying data models for data transaction: The data is transacted from various sources to
manage the processes and operation in an effective manner. The raw data can be collected for
formulating the business intelligence program for extending and managing the information
governance format. The organization of the data depends on the aggregated functions used
for exploration information in a sequential format. The granularity program should be used
for dividing the large data blocks into small data sets of similar data which helps in
accelerating the data retrieval process in an effective and efficient manner. The fine grain
methods applied on the data transaction used for handling the security parameters of the
information. The trusted data can be collected from the external parties through the utilization
62

Big Data Modelling
of Fine granularity program. It helps in managing the authorized control on the data
transaction. The Hadoop warehouse is developed for performing the aggregated functions in
systematic format. The information can be circulated according to the user demand. The
external sources are used for managing the data entity to control the flow of information.
The research methodology is conducted in six process steps for designing an effective big
data mode for managing the inventory data in an effective and efficient manner which are
listed below:
Problem Clarification: The study of the literature review helps in finding the gaps in the
management of inventory and storing the big data in the synchronous format. It is difficult to
handle the exponential growth of increasing data from various sectors such as health care
centres, business organization, artificial intelligence, IOT infrastructure, and others. The
upcoming model should provide the elasticity for managing the interfaces between the open
data. The invention of NoSQL databases helps in arranging the data without a relational
schema. The security is the major problems related to big data management system which has
to be resolved with the proposed data model. The available data models are not effective in
managing the realm of big data. The stability of the data depends on the organization of the
big data in the systematic format. The storage and management issues are the major problems
related to the big data. The focus should be given on managing the data in secured format so
that reliability and accuracy of the information can be effectively increased.
Objective of the proposed solution:
The proposed solution is going to manage the big data of the inventory controlled system in
structured format. The alert signals are sent to the mobile phones of the user to intimate him
about the availability of the inventory in the ware house. The sensor devices are placed in the
warehouse for collecting real time condition of the commodity. The systematic arrangement
of database is required for storing and managing the data in the classified format so as to
increase the availability of accurate data in the minimal amount of time. The effective and
efficient data model is required for managing the balance between the request and response
model and increases the efficiency of arranging petabytes of data (Agarwal, and Khanam,
2015). The proposed data model should be capable of handling voluminous data in a
systematic format for securing the information confidentiality, reliability, and accuracy. The
consideration should be given on the diversification of the data available from different
sources such as social networking platform, IOT architecture, and artificial intelligence.
63

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Big Data Modelling
Designing of the proposed model:
The execution of the jobs can take place by preparing the look up table for extracting the
relevant information from the database. The real time operations can be performed by
implementing the OLAP (Online analytical processing program) technology. The mobile
devices are placed to manage the connection between the information retrieved from the job
tracker. The data is stored in structured and semi-structured format so that data can be
effectively received. The map reduce technology is used for getting the scalability in the
working model of the big data management program based on the business intelligence
system. The RFID tags are used for reading the sensor data and stored it in relational
database. The focus should be given on managing the sensor data to get accuracy in
preserving the privacy of the data formats. The prototype is developed for managing the data
collected from the heterogeneous environment. The privacy methods should be extended by
deploying the effective protocol for securing the information stored in the database. The
system is organized for taking, receiving, and managing heterogeneous data from all sensor
devices placed in the warehouse. The preparation of the synchronization module depends on
the real time analysis of the inventory management system. The inconsistency of the missing
data should be periodically analysed by measuring the real time monitoring of the
information. It helps in setting the normal value of the threshold for predicting and comparing
for finding the underflow and overflow condition.
Demonstration of the Proposed data model:
The process of demonstration helps in analysing the efficiency of the proposed model in
preserving the data and information securely in the database. It helps in finding out the time
required for managing the queue in the batch processing units and generating relevant
responses in the stream processing unit to get output in minimum period of time. The
implementation of the data model focuses on getting the alert signal on the mobile phone of
the user about the available inventory in case of underflow and overflow condition of the
commodity in the warehouse. The measuring of the result helps in evaluating the
performance of the proposed system in comparison to the original Hadoop model used for
handling the data and information in the systematic order.
Evaluation and measurement:
64

Big Data Modelling
The test performance of the proposed model helps in measuring and evaluating the level of
accuracy and efficiency achieved I handling the voluminous data of the inventory control
system. The evaluation process focuses on evaluating the performance of the proposed model
with respect to the research question prepared during the start of the thesis which is described
in the table below:
Research Questions Evaluations
1. What is big data modelling? What are
the challenges faced in the application
of big data modelling techniques?
The research thesis helps in getting
knowledge about the big data model and their
applications in the real time inventory
management system.
2. What are the applications of big data
modelling?
The big data models are used for defining the
read and write operations on the voluminous
data stored in the database.
3. What progress and recent trends seen
in big data modelling?
The huge progress is seen in big data
modelling due to the invention of cloud
computing technology, Internet of things,
mobile networking, and Android smart
phones. The data warehousing and data
mining technologies play a vital role in
managing the data in the structured format.
The innovations of software define network
helps in the transmission of big data in the
cloud computing environment.
4. How big data modelling is useful for
managing the voluminous data of
cloud computing and IOT
architecture?
The focus is given on managing the security
and integrity of the information stored in the
database. The data should be aggregated in
the sequential format for minimizing the data
replication. The multidimensional mining is
implemented for organizing the inventory
information in a systematic manner. The
sensor data is placed in three layers of
information which are classified as
application layer, query accessing layer, and
65

Big Data Modelling
analysis layer.
Test cases for analysing evaluation process:
Criteria for
evaluating design
model
Description Priority Success Measures
Accuracy in
retrieving query
output
The focus should be
given on measuring
the accuracy of the
responses retrieved
out of the query
posted by the user.
High Yes
Execution process
and operation time
The focus should be
given on measuring
the time required for
completing the
particular execution
process within the
given period of time.
Low Yes
Operational
efficiency of the
server
The operational
efficiency of the
server can be
measured in terms of
time required for
completing the
process and
procedures within the
given period of time
Medium No
Overloading
acceptance of process
The fault tolerance
power should be
measured for
analysing the number
of processes
Low No
66

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Big Data Modelling
associated with the
server for the
acceptance of query
in the queue
Vulnerability attack The focus should be
given on analysing
the power of
vulnerability attack
on the processes
High Yes
Association of the
data analytical tools
The focus should be
given on analysing
the power of
analytical tools
associated with the
processes and other
query posted by the
user.
Medium No
Utilization of CPU
and processor usage
The CPU utilization
can be measured in
terms of processes
and query executed
effectively
High Yes
Utilization of
memory
The memory
utilization is
measured in terms of
processes and queries
responses contained
and stored in the
memory
High Yes
The baseline of the security measures should be pre-defined so that the efficient and effective
methods can be laid down for preserving the confidentiality of the information. The
communication path of the data packets should be defined for transmitting the data packets
67

Big Data Modelling
between the sender and the receiver. The authorised controls should be developed for
systematic bonding of the request response model. The sensitive data should be collected
from the sensitive devices for executing the queries in the specified format. The test cases
should be developed for analysing the impact of performance and operation on the working
model of the inventory control of big data management system. The redundant data should be
removed from the storage spaces so that large memory chunks can handle relevant data
according to the query posted by the user. The transparency of the information helps in
improving the data quality. The data is classified on three major categories which are named
as confidentiality, accuracy, integrity, and availability of the information (Kaur, and
Dhaliwal, 2015). The data protection method is used for reflecting the secured information in
protected format. The real time information can be gathered through the smart devices for
getting the accurate information about the inventory available in the warehouse. The
acquisition of the batch process helps in gathering information from the large scale
database. The cleansing function is performed for removing the redundant information from
the memory. The critical sources should be measured for collecting the sensitive information
from varying sources. The multiple tests should be performed for measuring the performance
of the processor for identifying the query response. The execution time of completing the
process is calculated for identifying the efficiency of the processor. The response time of the
query is measured for developing the information flow. The total execution time of the
processor should be identified for getting the required output. The average values of the
server baseline processes should be measured for calculating the accuracy of the processes.
The overloading of the processes is used for identifying the server capability for handling the
fault tolerance power. The accessibility of the data depends on the execution speed of the
processor. The normal threshold value of the inventory should be identified for calculating
the real time condition of the inventory available in the warehouse. The sensitivity of the
information depends upon identifying the query result. The information accessibility depends
upon the analytical procedures used for identifying process flow. The multiple accesses of the
data sets from the different location of the warehouse gives the key effect for combining the
result of the real time sensor devices.
The name node are organised for processing the map reduce function on the data blocks
arranged in sequential format of information. The mapping of the information is done by
arranging the assorted data in sequential and sorted order. The data nodes are used for
designing and developing the instances for the building blocks of the data. The applications
68

Big Data Modelling
can be managed on the distributed data server. The task tracker is used for managing the
responsibility of developing progress report on the data server on the distributed node for
tracking the job. The relocation of the processes helps in managing the data retrieval process
in effective manner. The tracking of the job can be effectively manageable on the data server.
Extraction, loading, and transformation are the major operations performed by the database
server in retrieving the information from the multiple sources. The data mining operations are
performed for improving the integration of the data units to develop the effective data models
for performing the cleansing operation on the management process. The data quality can be
improved by managing the agility and scalability of the information. The rapid retrieval of
the data is used for resolving the complexity of the data. The statistical methods are used for
performing calculations on the data stored in structured format in the data tables. The parallel
processing can be done on the large amount of data items for improving the extraction
process from the non-relational database. The software defined network helps in defining the
delays in the exchange of information. The aggregated functions are developed for pushing
the data on the edge of the queue. The bandwidth of the network traffic can be increased by
organizing the reference model of the Hadoop distributed file system. The performance of the
system is depend on the index table prepared for accessing the desired information from the
multiple data sources. The spectrum of the big data is prepared for presenting the data in
structured format. The map reduce technology is used for streamlining the information in the
configured database. The loading of the work depends on the capacity of the network to hold
the processes and queue in the underlying batch processing unit. The size of the big data is
the major factor for responsible in increasing the scalability of the network. The cross layer
segmentation of the Hadoop is used for recognising the relevant information in the processing
schedule of the data blocks. The progress of the big data is increased due to the association of
the cloud computing and mobile network architecture. The potential growth of the cloud
environment can be increased by deploying the program of resource management. The
decision making capability of the system manager can be improved by generating the sharing
of data along the implemented data sharing processes and operating system. The
virtualization of the application helps in increasing the accessibility of the information from
the multiple sources and cloud environment. The Radoop architecture increases the
management of the resources according to the real time need of the system architecture. The
data flow schedule can be enhanced by prevailing the hash function for maintaining the index
table. The parameters of the system can be improved by tuning with the log files stored in the
Hadoop architecture. The cost minimization algorithms are used for managing the resources
69

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Big Data Modelling
at multiple data stores. The data centric benchmarks are created for utilizing the flow of
information to prevail the security architecture on the information stored in the database. The
mobile data can be collected from the heterogeneous environment so there is a high risk of
privacy and scalability issues associated with the information and data stored at the different
location of the data centres.
Chapter 4: Proposed Big Data Architecture
In this paper, we are going to propose and design the big data modelling technique for
managing the stock inventory system by sending the alarm and message on the mobile phone
of the owner about underflow and overflow condition of the stock available in the
70

Big Data Modelling
organization. From the research, we have found that Storage area network and Network
attached storage stores the data in the data stores. The data mining technique is based on
nearest server for finding and extracting related information about the inventory of the
organization. The inventory control system should be developed for managing the commodity
in an ordered manner. The inventory can be effectively accessed by the user through the e-
commerce trading mechanism. The business intelligence is used for increasing the efficacy of
the general model of the supply chain management program. The real time operations and
functions are performed for handling the decision making capability by using the data mining
operations. The operations can be controlled for optimizing the requirement of the business
intelligence program. The performance can be accelerated by using the predictive analytical
approach for defining the minimum latency period. The aggregation of managing the
homogenous data is used for managing the real time information. The traditional architecture
should be extended and expanded for promoting the management of the big data in an
effective manner. The electronic data interchange is used for reducing the complexities of the
data management program. The virtualization of information sharing is used for representing
the data in more formalized manner. The architecture is composed of various components
which are stated as sensor data, infrastructure which support cloud storage of the files,
integration and management of data buses, management of data storage techniques, data
analytics engine, and lastly, visualization of the rendering system. The objects and sensor
data are placed for managing the acquisition system. The RFID tags are used for managing
the sensor and actuators. The cloud infrastructure is used for managing the connection
between the suppliers and manufacturers for balancing the data generated from the logistics
operations. The data bus is used for retrieving the data from the internet and remote locations.
The real time processing can be effectively carried out for optimizing the data storage unit.
The analytical engines are used for transforming the raw data into processed data
(Katsipoulakis, and et. al., 2015). The complete architecture depends upon the placement of
data analytical engine. The interoperations of the processes can be effectively manageable by
integrating the system design with the sensor data. The focus should be given on types of cost
with respect to inventory management system which are summarised below:
Purchase costs: The purchase cost is the cost which represents the actual cost of the
commodity
Order cost: The order cost is the cost which represents the addition of miscellaneous cost for
preparing the purchase order.
71

Big Data Modelling
Carriage cost: The carriage cost is the cost which represents the cost spent on managing and
storing the commodity in the data warehouse.
The study of the literature research paper helps us to find that the methods which are useful
for managing the large data generated in the organization with respect to the company
inventory. The map reduce framework is the most optimized technique for organizing the big
data generated by the clusters of machines. The master slave architecture is implemented for
using the map and reduce functions. The intermediate files are searched for mapping the
required information from the database. Traditionally, the organizations are uses two
technologies for managing the large voluminous data which are named as Hadoop map
reduce technologies and implementation of distributed file system. The HDFS is the most
promising programming model for finding the right information at right time. Some of the
most commonly used data processing technologies are summarised in the table below:
Technologies Description
Storm The real time data can be easily accessed by the Storm data
processing technologies. It is based on two methods which are
classified as Bolts and Spouts. The Spout method is used for
developing the input queue for loading and sequencing the data in
the systematic format. The bolts function is used for managing the
output queue of the data. It works on systematic arranging three
nodes which are named as master node, zoo keeper, and worker
node. The intermediate files are searched for mapping the
required information from the database.
S4 The S4 technology is used for processing the data on the
distributed system. The Java language is used for synchronizing
the events in the systematic organizations.
Spark The Spark technology is used for accessing the data from Hadoop
and Hbase Database. The SQL language is used for modifying the
data in the data warehouse. The distributed datasets are created
for transforming the data into small chunks of information
Apache Flink The real time data can be effectively handled by the open data
source. The map reduce technology is used for finding the similar
data and keeping it in particular datasets. The Machine Library
(MLib) is used for increasing the scalability of the applications.
72

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Big Data Modelling
The iterative processing techniques is used real time fetching the
data from the different sources.
MongoDB The NoSQL database is used for storing the large volume of data.
The data is stored in the machine readable format.
Related work:
The researchers are working in the direction of managing the real time information of the
inventory for the organization to overcome the problems of underflow and overflow
condition of the commodity. The focus is given on managing the security and integrity of the
information stored in the database. The data should be aggregated in the sequential format for
minimizing the data replication. The multidimensional mining is implemented for organizing
the inventory information in a systematic manner. The sensor data is placed in three layers of
information which are classified as application layer, query accessing layer, and analysis
layer. The data accessibility and retrieval can be done through the application layer. This
layer helps in optimizing the data storage and sharing through the security procedures. The
query analysis layer is used for sorting the data in the systematic manner according to the
query posted by the user. The analysis layer is used for analysing the data by minimizing the
replicated data from the data ware house. The monitoring of the data can be done by
analysing the sensor data stored from the real time application units. The transmitters are
used for sending the data from the sensor units to get the information related to the real time
inventory available in the data warehouse of the enterprise. The Hadoop database is used
keeping the details of the sensor unit for measuring the remote information of the inventory.
The sensor devices are placed on the warehouse for finding the remote measuring of overflow
and underflow commodity units (Condie, and et. al, 2015). The Hadoop processing unit is
used for measuring the statistical information of the commodity presents in the warehouse of
the organization. The mapping functions are used for analysing the difference between the
normal threshold value and the readings of the sensor data. In the condition of underflow and
overflow, the alarms are sent to the application layer for managing the relevant quantity of
the inventory in the warehouse. It helps in taking the appropriate decision by the owner of the
organization to fulfil the requirement of the customers and increases their level of
satisfaction. The spark technology is used for developing the prototype for systematic
organization of the data (Malik, and Sangwan, 2015). The data mining operations are
performed to overcome the complexity of getting relevant information from the voluminous
73

Big Data Modelling
data. The research helps in identifying the problems associated with the big data processing
environment. The heterogeneous data is collected from different platform which can be
comfortably handled in the semi-structured format for scaling the big data on the data server.
The accessibility of the data can be improved by synchronizing the information in the
structured and semi-structured architectural format. The scaling function helps in managing
and storing the large volume of data in proper format. It helps in increasing the faster rate of
data retrieval process. The time is the major factor for increasing the reliability of the Hadoop
distributed file system in managing the orientation of the big data. The data sets can be
retrieved at high velocity by scaling the data size. The data privacy is the major issue in
handling the big data on the cloud environment. The multiple security protocol are used for
securing the information on the cloud computing and mobile networks. The computational
speed of the data retrieval process can be optimized by assembling the data from different
format. The map reduce technology is used for managing the Hadoop kernel to intimate the
operations and function is parallel processing unit. The mapping of the information increases
the iteration process to check the scheduling of the data units. The Hadoop system holds the
data in column structure format. The distributed data sets are used for managing the query
and response presented by the user. The persistence and performance of the data sets can be
optimized by increasing the configuration of data in the memory. The retrieval process of the
data sets can be improved by arranging the plugins for the micro-kernel. The error handling
program for arranging the data sets helps in enhancing the timeliness program for the
visualization of the data in structured format. The processing of the big data can be optimized
by arranging the application domains to minimize the technological challenges with the
advancement of cloud computing and mobile networks. The diversification of the data
increases the vulnerabilities attacks on the information and data stores. The statistical data
can be collected for increases the functionality of the randomised data collected from the
varying data sources. The noise in the sending of the data packets from sender to the receiver
should be detected. The sensitivity of the data can be measured by evaluating the accuracy of
the information retrieved in processing the query of the user by managing the batches of
different processes. The index server is placed for searching and mapping the processes
through the index value generated by implementing the hash functions. The data stores are
managed for increasing the operational efficiency and calculation for retrieving the accurate
information. The session manager is appointed for managing the session of information
retrieval to coordinate the data value for increasing the transaction of information. The SQL
statement and Data manipulation statement is received and handled by the SQL processor for
74

Big Data Modelling
executing and scheduling the processes according to the requirement of the query statement.
The multidimensional tables are developed for storing the request at multi-dimensional
processor unit. The preprocessing and extraction of the information is scheduled and
coordinated by the index server in an authorised and authenticated format. The standalone
information is recognised by the name server to minimize the complexity of data retrieval
process. The configuration of the memory takes place in columnar format so that the
operating efficiency of the processing unit can be optimized. The modelling and
representation of the data is customised in three different format which are named as
attribute view, analytical view, and calculative view. The attribute view is used for describing
the functionality and characteristics of the data sets arranged on the processing unit. The
analytical view is used for presenting the data in tabular format. The aggregation functions
are used for presenting and performing the operation on the large volume of big data. The
schema definition is defined for organizing the master data in the star schema format. The
column data is used for representing the particular characteristics of the data sets. The
complexity of the data can be resolved by presenting the information in the data tables. The
association of the objects helps in presenting the information in the large data sets. The
authorised information can be scheduled for fulfilling the requirement of the user by
designing the SQL statement. The schema definition helps in controlling the administration
of the data control in sending the data packets between the sender and receiver. The
repositories are developed for accessing the required information from the large data sets.
The specific row of information is developed for sequentially presented the data value in
tabular format. The development of the big data model is the advancement over the
traditional data model to manage the effective system call interface. The block I/O unit is
used for managing the flow of information through the object model interface. The big data
models are used for increasing the accessibility of the information to manage the metadata on
the physical storage system. The navigation o the data tables can be done through the index
value calculated through the hash value. The identification of the root object increases the
accessibility of the desired information. The unique ID of the object block is created to find
the data on the particular storage space. The multiple operations can be effectively performed
on the user objects to specify the location and operations on the components. The flexibility
in retrieving the object can be optimized by sequencing the allocation of the data entities. The
free spaces can be increased by minimizing the redundant information stored at different data
centres. The physical media is searched for retrieving the specified information. The decision
tree are developed by arranging the data in the hierarchical data stores for separating the
75

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Big Data Modelling
information between the metadata and server information. The accessibility of the data can be
enhanced by performing the parallel processing system. The sharing of data helps in
managing the relationship and interdependencies among the smart devices located at the
warehouse. The behavioural pattern of the data should be analysed for offloading the data
from the server location. The performance of the host devices can be enlarged by developing
the intelligence system to perform the transaction in the aggregated format. The dependency
of the files can be manageable by managing the granularity of the security architecture. The
information is organized in the block format. The security manager is responsible for
authorizing and authenticating the information transaction from the varying sources. The
security credential of the information should be handled carefully for minimizing the
association of the vulnerabilities on the network during the transmission of information from
the large volume of data packets. The interface classes should be specifically defines so that
the abstract layer can be developed for creating the instances of the classes. The clustering
and de-clustering operations can be performed for increasing the compatibility of the storage
objects. The direct mapping pattern helps in increasing the association and bonding between
the data stream and processes interlinked with each other. The logical objects are identified
according to the logical object ID, size, and attributes of the Logical Objects. The storage
objects are customised with the length and size of the storage objects, ID, and attributes. The
clustering of the data sets is accompanied with the mapping process for initializing the
relevancy with the data objects. The parallel processing of the information can be performed
by accessing the I/O bandwidth of the data items. The logical objects can be retrieved directly
from the storage objects. It helps in increasing the flow of information between the large data
sets to emphasise the transaction processing system. The object pooling can be customised
with the granularity of the access control mechanism. The metadata is used for representing
the sharing and flow of information between the objects to get emphasis of the access control
policies.
Our research is a step forward to develop a big data model architecture for sending real time
alert signals to the mobile phones of the enterprise owner. The owner will be treated with the
real time information of the inventory available in their warehouse. The inventory data can be
systematically organized by implementing the batch processing system and stream processing
system together in a single unit. In our research, we will develop a conceptual architecture of
managing big data of inventory control of the organization by using the sensor technology.
76

Big Data Modelling
The available big data technologies will be used for designing the system capable of giving
real time information of the inventory on the mobile phones of the company owner.
Proposed Architecture
In this paper, we are going to develop a big data model architecture conceptual framework for
analysing the various conditions such as underflow and overflow from normal threshold
value. The heterogeneous data of the different commodity can be effectively handled by the
proposed technology. In this big data processing model, we are going to combine the batch
processing model and stream processing model. In batch processing system, data sets are
prepared for storing the data in a particular data node. The filter function is used for filtering
the required data from the large data sets of inventory control information. The cleaning
function is used for removing the replicated data from the database. The selection and
extraction of the inventory information from the large data sets are two most important
functions for handling the query placed by the user effectively. The data is received from
multiple sources so that relevant structure and format of the information can be synchronized.
The implementation of sliding window helps in using the pre-processor algorithm for
splitting the data in different blocks. The information is stored in blocks. The data is retrieved
from the blocks by using the filtering and cleaning function in batch and stream processing of
data.
Creation of the Batch processing unit
The batch processing unit is developed for extracting the relevant information from different
sources and format. It helps in organizing the heterogeneous data in systematic format.
Development of the data acquirement layer:
The data acquisition layer is used for measuring the real time information related to the
inventory control information. The data is stored in semi-structured and structured format.
The information which is measured and monitor by the data acquisition layer are storage
server information, Supply chain big data, marketing, procurement, warehouse and
transportation.
Storage Server Information: The information related to the storage server information helps
in keeping the details of frequent accessing of the inventory information. It helps in
measuring the capability of accessing the required information from the database. Data size
gives the details of files required for keeping the large voluminous data in the structured
77

Big Data Modelling
format. The data is categorised according to the data types used for representing the
information in the heterogeneous format. The size of the queue is used for analysing the
waiting time of the request. The availability of the replicated information helps in analysing
and measuring the fault tolerance capability of the designed model. The user control helps in
managing the transfer of data over the network.
Supply Chain Big Data: The supply chain big data is characterised with 5 V’s value which
represent the data velocity, data veracity, data value, data volume, and data variety (Mathias,
and et.al, 2014). The voluminous data is characterised with ERP transaction done for e-
commerce trading, customer surveys, claim for inventory by the customers, generation of
invoices, and many others. The data can be moved between the different units of the network
in horizontal and vertical alignment. The valid data can be retrieved from the transaction. The
focus should be given on managing the integrity and confidentiality of the big data in supply
chain. The GPS transponders and RFID readers are used for finding the location of the
commodity in the data warehouse.
Marketing: The customer attention can be taken by carrying out exclusive marketing of the
product. The information about the product should be stored in the blocks for increasing the
efficiency of data retrieval according to the demand placed by the user (Li, 2013). It helps in
sensing the behaviour of the customers. It increases the accessibility and feasibility of the
information.
Procurement: The procurement process helps in managing the coordination between the
supply chain data. The strategies should be developed for purchasing the commodity
according to the availability of the inventory in the warehouse. The procurement data is
organized in the semi-structured format for increasing the affordability of delivery among the
customers which in turn helps in increasing the level of satisfaction among them.
Warehouse: The use of RFID reader helps in increasing the coordination in the data mining
process from the data warehouse. The systematic approach can be followed for getting the
right data at right time. It increases the automatic sensing of data availability in the data
warehouse. The extended sensors are used for increasing the power of intelligence. It helps in
increasing the inventory control on the commodity availability in the data warehouse. The
data retrieval velocity can be effectively improved due to the systematic organization of the
inventory in the data warehouse.
78

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Big Data Modelling
Transportation: The routing of the packet delivery system is the most important step in
developing the effective inventory control system. The static data and dynamic data of the
routing process should be systematically stored in the working model of managing the big
data so that the path of the commodity delivery can be linked with the GPS system for easy
tracking of the product by the customer and administrator. It helps in increasing the level of
customer satisfaction by creating the easy tracking of their shipment sent by the organization
(McAfee, and Brynjolfsson, 2012). The balance should be made between the underflow and
overflow condition of inventory with respect to the net threshold value.
Inventory Record System: The database contains all the information related to the inventory
and commodity available in the warehouse in the digital format. There are different sources
of data such as warehouse operations, procurement, Inventory cost, logistics, customer
reviews, and many others.
Inventory Images: The image of the inventory and commodity can be effectively stored in the
database. It helps in bringing accuracy in taking effective decision.
Sensor Data: The information retrieved from the monitoring tools of the data warehouse such
as cameras and microphones. The real time availability of inventory can be checked for
overflow and underflow condition. It helps in finding the condition for sending the alert
message to the mobile phones.
Association of mobile phones: The mobile phones are used for receiving the alert signal in
emergency situation of overflow and underflow situation of the inventory from the normal
threshold value. The Satellite positioning services and cameras are used for getting the real
time information of the inventory.
Semantic Module: The ontological approach is used for getting the automation in managing
the inventory effectively. The XML language is used for interchanging the relevant
information regarding the inventory management system. The XML document is prepared for
managing the big data of the inventory accordingly to fulfil the user requirement.
The diagram below shows the presentation and proposed model for generating the big data
from heterogeneous environment:
79

Big Data Modelling
(Source: Made by author)
Preparation of the Data Layer
The data layer is prepared for performing the computation and calculation with regards to the
availability of the commodity in the warehouse. The predictive models are prepared
identifying the efficiency of the data mining processing model used for fetching the relevant
information from the data warehouse.
Data Filtration process: The Data Filtration process is used for dividing the large chunks of
information into smaller blocks. It helps in measuring the normal threshold value of the
inventory control management process.
Data cleaning process: The Data cleaning process is performed for getting the information
related to the missing information of the inventory. The normalization process is
80

Big Data Modelling
implemented for synchronising the data effectively. The information of the noisy data should
be minimized to get relevant information of the missing data.
Selection and Extraction of the required information
The monitoring of the inventory management system can be done by implementing the
proliferation devices process. The required information related to the inventory can be
collected from the large volume of data. The features of the inventory are used for matching
the required information of the user. The statistical tools are used for analysing the underflow
and overflow condition of the inventory available in the warehouse.
Implementation of Predictive Model
The predictive models are prepared by analysing the availability and non-availability of the
inventory in the warehouse by studying the historical records of the organization. It helps in
getting the accuracy in the quality of the inventory management and control. The data can be
monitored by analysing the statistical report and graphs. The batch processing units helps in
demanding the machine learning approach.
Data storage
The voluminous data of the inventory management system is difficult to be handled by the
traditional databases so in this we will develop the architecture for the database storage which
is the combination of HDFS and MongoDB. It helps in increasing the scalability for storing
the information related to the commodity in effective manner. The data is collected from
heterogeneous sources. The information is stored in the semi-structured format.
Preparation of the Stream Processing Layer
The development of the stream processing layer depends on the synchronization of the data
module, pre-processor adaptive model, and predictor adaptive module.
Data Synchronization process: The preparation of the synchronization module depends on the
real time analysis of the inventory management system. The inconsistency of the missing data
should be periodically analysed by measuring the real time monitoring of the information. It
helps in setting the normal value of the threshold for predicting and comparing for finding the
underflow and overflow condition (Hashler, and et. al, 2014).
81

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Big Data Modelling
Pre-processor adaptive model: The pre-processing tasks help in analysing the arrival of the
inventory. The extraction of data can be done by implementing the machine learning process.
The streaming data can be extracted from the adaptive system. The accuracy can be achieved
by analysing the result of stream processing units. The overlapping of the information can be
managed in the data sets. The sliding window technique is used for managing the arrival of
the data in the processing units. The accuracy of the system can be achieved by synchronizing
the pre-processing units. The adaptive predictor is placed for adjusting the normal threshold
value according to the current situation of the data units (Zheng, and et.al., 2015).
Predictor Adaptive Module: This module plays a vital role in providing updated information
for sending the message to the mobile phone of the user. The prediction of the inventory
information helps in keeping the updated version of the information. The mapping function is
used for finding the information asked by the user for managing the inventory in the
synchronised format.
Query Processor: The query processor is implemented for managing the queue of the query
posted by the user to get the desired information related to the inventory and commodity
availability. It sent the message of out of stock and in stock related to the product asked by
the user. The message will be sent to the mobile phones of the user. The query processor
focuses on generating the responses by collecting and combining the results of batch
processing unit and stream processing units.
Visualization layer: The visualization layer is used for representing the output of monitoring
report of the inventory available in the organization. The real time processing of the data can
be done and presented to the user for getting the updated and required information from the
voluminous data. The proactive messages are sent to the mobile phones of the user to keep
balance of inventory in the organization. The real time information can be fetched from the
MongoDB.
The systematic arrangement of the components helps in getting the robustness and accuracy
in the management of the sub-system and complete architecture of the next generation big
data management model for inventory control system. The traditional big data architecture is
composed of NoSQL and Hadoop distributed file system and map reduce technology. The
clustering of the files is done is managing the parallel processing system. The parallel
computation of the processes helps in balancing the large and huge amount of data in
effective manner. The servers are arranged in loosely coupled manner for systematic
82

Big Data Modelling
transmission of data with the clients. The files are arranged in the HDFS according to the
name node and data node of the documents. The map reduce technique can be accelerated by
using the job tracking system which helps in assigning the process to the relevant data nodes.
The location and address of the required data can be tracked by using the job tracker for
keeping the balance of process queue between the client and server. It helps in managing the
communication through master slave process. The execution of the jobs can take place by
preparing the look up table for extracting the relevant information from the database. The real
time operations can be performed by implementing the OLAP (Online analytical processing
program) technology. The mobile devices are placed to manage the connection between the
information retrieved from the job tracker. The data is stored in structured and semi-
structured format so that data can be effectively received. The map reduce technology is used
for getting the scalability in the working model of the big data management program based
on the business intelligence system. The RFID tags are used for reading the sensor data and
stored it in relational database (Oussous, and et.al., 2018). The integration of the devices
helps in performing the services in an optimized format. The security is the major concern for
securing the information related to the inventory so that overflow and underflow of data does
not occur. It helps in minimizing the wastage of inventory by circulating the inventory. The
analytical engines are used for managing the massive data of the supply chain management
program.
The Diagram below shows the proposed architecture of big data management model for
managing the large volume of data in a systematic format.
83

Big Data Modelling
(Source: Made By Author)
Description of the System Architecture
The proposed architecture is the user friendly management of inventory data to get effective
results for solving the problem of big data management. The development of the user
interface provides suitability to the user to get interactive with the features and availability of
the product in the warehouse to be delivered on time. The operations related to the inventory
help in developing the interface to provide statistics view of the data organization. It helps in
finding the sales ratio to the availability of the product in the warehouse (Deepika, and
Raman, 2016). It helps in keeping the balance between the demand and supply of the
inventory. The analysis of the data is used for satisfying and performing operation on the
processes waiting in the queue of batch processing unit which results in accelerating the data
retrieval process. The data mining algorithms are used for systematically arranging the
system process to present the data in the structured and optimized manner.
Development of User Interface: the interactive interfaces are created for developing the stock
management operations. The statistical charts are prepared to analyse the current status of the
inventory available in the warehouse. The turnover ratio can be calculated by analysing the
inventory index. The status of the inventory can be forecasted for managing the functioning
84

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Big Data Modelling
of the algorithm. The integration of the user friendly interface helps in managing the user
interface.
Data analysis layer: The data processing can be done by functional operation on the
optimised the data. The forecasting of the data can be processed by developing the explored
functional module. The application of regression analysis is used for defining the inventory
analysis and control process. The organization of the data can be done by implementing the
data miner algorithm so as to accelerate the data retrieval process. The functional program
can be done for intimating the inventory analysis program. The value chain model of the big
data is developed to take intelligent decision for optimizing the management of the inventory
information in more structured format. The systematic approach is followed for sequencing
the data and information in consecutive layered architecture. The layered architecture view of
data representation is composed of four layers which are named as data generation layer, data
acquiring layer, data storage layer, and data analytical approach.
Task and System management program: The integration of the user friendly data
configuration is used for resolving the complexities related to the transmission of data over
the network. The execution of the different functional program can be done in distributed file
system. The work flows and scheduling of the process should be done automatically so that
the processing of the information can be accelerated at faster rate. The complexities of the
data can be minimized by systematic handling of the data in an appropriate format.
Exploration of the Data analysis layer: The statistical overview of the data management
program helps in developing the required report of the inventory management system. The
focus should be given on analysing the data in the given time frame. The efficiency of the
system can be optimized by dividing the data under heterogeneous format so as to encompass
to deliver clear estimation of the inventory available in the warehouse. The algorithm
libraries are used for designing the configuration of the operational panel. The sequential
analysis of the workflows helps in optimizing the execution of the process automatically. The
forecasting of the inventory can be effectively done by implementing the aging mining
algorithm. The historical data analysis report is used for defining the statistical report of data
available in the warehouse.
Result Visualization layer: The automation in the storage unit is used for updating and
retrieving the time series pattern of data availability. The statistical graphs are prepared for
85

Big Data Modelling
analysing the current status of the warehouse. The operation of the data mining program
depends on the retrieval of required information according to the query posted by the user.
Dynamic Prediction Model: This model helps in getting the accuracy and interoperation in
the data available in the warehouse. Machine learning algorithms are used for getting the time
series pattern graph. The forecasting of the data available can be done by analysing the
historical report of managing the demand and supply of the inventory in the warehouse. The
linear regression algorithms are used for finding the results of transaction done by the user for
optimizing the processes in the batch processing unit.
Detection of Data anomalies: The anomalies in the data retrieval program can be determined
by monitoring and tracking the normal threshold value of the index. The threshold value is
used for calculating the maximum and minimum points of the data retrieval process. The
performance of the system can be predicted by implementing the dynamic prediction model.
The intelligent decision can be taken by analysing the historical view and trends of the data
management in the warehouse (Padhy, 2012). The reordering of the data is the most crucial
step for managing the flow of data. The user request can be treated with the updated
information to avoid anomalies in the result of data extraction process.
The value chain model of the big data is developed to take intelligent decision for optimizing
the management of the inventory information in more structured format. The systematic
approach is followed for sequencing the data and information in consecutive layered
architecture. The layered architecture view of data representation is composed of four layers
which are named as data generation layer, data acquiring layer, data storage layer, and data
analytical approach (Kambhampati, 2016). The focus should be given on developing the
effective visualization layer for easy retrieval of information from the large volume of data.
The data mining approach is applied for pattern matching of the composed components and
data.
Data Generation layer: The data is generated from the various heterogeneous platforms so as
to develop the complex data sets. The data can be searched by domain specific index value.
The data consistency can be managed by dividing and developing the large blocks and
chunks of information. The variety and veracity of the big data available in the inventory
management system can be effectively handle by the data generation layer. The structured
view of the tabular data representation can be manageable by this layer. The XML files are
created for developing the semi-structured format. The text, audio, video, and images can be
86

Big Data Modelling
handled by the unstructured format. The high frequency hierarchical data can be controlled
by the structured streaming process layer. The sensor data can be collected from the sensor
devices to represent the semi-structured stream process data. The Log files are created to
develop the unstructured stream processing layer architecture view. The velocity of the big
data can be controlled by summarizing the stream processing system. The security and
privacy issues can be systematically handled by managing the control on the architecture of
data flow. The response time and latency period can be calculated to govern the data
generation and management program. The data is stored on the basis of priority system to
schedule the processes systematically in the response and request queue. The real time
analytical approach is used for governing the data packets flow over the network for
controlling the congestion and traffic. The multi-structured and hierarchical view of the data
is developed for optimizing the storage capacity of the database. The data can be shared from
multiple locations to generate the log files on the basis of metadata index management
system. The diversification of data is helpful for dividing it into different heterogeneous data
format. The pre-processing function is implemented for handling the structured and
unstructured data in the tabular format. The metadata is used for finding the location of the
text, images, audio, and video on the database. The log files are used for handling the
unstructured data in sequential format. The analytical tasks of the big data can be stored and
processed concurrently by performing the cleansing operations.
Data Acquiring layer: The transmission of data can be effectively done by developing the
chunks of information for easy pre-processing of the data. The analytical methods are used
for developing the sustainable data formation. The sensor data is collected from RFID readers
and sensor devices placed for getting current status of the inventory. The compression
technologies are used for storing the information in specified and structured format. The pre-
processing operations are performed for managing the data mining process to retrieve the
required information from the large chunks of data. The redundancy of the data can be
minimized by synchronising the information and implementing the cleaning function. The
data is injected into the batch processes to accelerate the velocity and frequency of data
accessing. The transmission of message takes place for getting knowledge about the real time
information of the stored data according to the request placed by the user. The complexity of
the data can be manageable by implementing and performing operation through the
utilization of complex event processing system. The extraction of the information takes place
through the event loading program in the data warehouse. The data integration tools such as
87

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Big Data Modelling
Microsoft SQL server and Apache scoop can be implemented in improving the performance
of the data retrieval process. The loading function is used for loading the information in the
memory by searching the information through the metadata index. The messaging system is
used for developing the data flow use case to increase the integration of the data from
different platforms. The motion of the data can be controlled through the distributed system
of Hadoop distributed file system. The sequential growth of the data depends on managing
the incoming and outgoing operations of data. The millions and trillions of data is transmitted
within the fraction of seconds by developing specified cluster of information to handle
specific information. The consistency of data storage process can be increased by injecting
and retrieving the information from real time analytical framework. The data acquiring layer
is responsible for sending the large volume of data at a very high speed. Th synchronization
of the event can be systematically handled for preserving the functionality of the pre-defined
functions. The operational efficiency of the data packets can be improved by implementing
the union and join operations for retrieving the required information from the large volume of
data. The assessment of the information can be organized for increasing the fault tolerance
capability of the working model to manage operations in sequential format. The security
system is implemented for securing the privacy and confidentiality of the information. The
entrance of fraud can be detected. The aggregation functions are used for injecting the
required operation to stabilize the flow of information serially to take automatic decision. It
helps in minimizing the human interaction with the business intelligence program. The
authenticated and authorized control is helpful for retrieving the required information from
the working model of the big data management system. The ETL functions are used for
serially sequencing the processes in the query queue to generate frequent responses with
respect to the user demand. The message transmission process is useful for synchronising the
data streams. The communication between the data units can be done in the proper format
through the use of security protocols. It helps in data retrieval process through the external
sources. The events are synchronised in the NoSQL database by managing the key value pair
to search the data by the index system.
Data Storage: The capability of the data storage system depends upon the formation of the
large data sets and chunks for easy retrieval of information. The data management depends
upon the pooling of devices and resources. The configuration of the address takes place
through the indexing of the data node. The query process and response generation can be
accelerated by enabling the interface functions. The analytical procedures are laid down for
88

Big Data Modelling
getting the required information from the data warehouse which helps in increasing the level
of user satisfaction. The extraction of the value depends on mapping functions implemented
for pre-processing of the information. The big data infrastructure support can be developed
by persisting the data storage technologies. The volume and veracity of the stored data should
be identified to organize the data on the distributed file system architecture. The flexibility of
the data storage unit can be accelerated by systematic arranging the files layered reference
order. The key value pair is generated to develop the graph based systematic arrangement of
the database. The horizontal and vertical scaling of the data stores helps in increasing the
distribution of in multi-structured format. The capabilities of the system can be optimized by
developing the structured format to synchronise the data in an ordered manner. The layered
orientation view is generated for streamlining the data on the distributed file system
environment. The layered framework is comprised of RDBMS, NoSQL Key value data store,
Documentation stores, Column optimized framework, Graph database, and distributed file
system. The vertical scaling of the database can be done by implementing the symmetric
multi-processing unit. Massive Parallel processing unit is implemented to develop the
structured data for analysing the control on the data retrieval process. The overloading of the
works can be manageable by reflecting the replicated copies of the data on the data server.
The data can be shared by the host devices to systematic synchronise the automation in the
distributed file system. The fault tolerance capability to access the files from the stored data
server can be minimized by scaling up and scaling down the access control on the data
retrieval process. The memory utilization helps in increasing the performance of the data unit
by developing the hierarchical tree structure. The data can be horizontally partitioned into
large data chunks to manage the loading of the information in the memory. The capabilities
of the big data can be maximized by pre-defining the velocity and volume of the large data
stored in the server unit. The real time retrieval of the information can be fetched by
performing the parallel processing operations.
Data processing layer:
The data operations performed to retrieve the relevant information from the large volume of
data are handled by the processing layer. The read and write operations are synchronized for
managing the information in batch on NoSQL database unit. The stream processing unit is
uniquely identified to preserve the identity of the information. The batch processing engine is
used for increasing the association of the data units. The RDBMS is used for performing the
SQL processing operations. The map reduce technology is used for arranging the processes in
89

Big Data Modelling
the batch to optimize the flow of batch operations. The association of the SPARK technology
is used for performing the unified operations on the data and information stored in the large
memory spaces of the database. The distributed engine is used for enabling the machine
learning algorithms for optimizing the processing operations to emerge into single data units.
The aggregations and grouping of the operations is used for enhancing the functional
approach for completing the given tasks. The data retrieval rate can be optimized by
performing the massive parallel processing on the processes. The data can be sequentially
manageable on the working architecture of the stream processing unit. The real time
management and retrieval of information can be organized for correlating the intermediator
processes. The regular expressions are used for enhancing the functional program to originate
the user defined functions. The large datasets are prepared for reading the data from memory.
The structured code helps in executing the jobs at the faster rate. The output of query can be
stored in the stream process architecture. The data can be managed in micro-batches for
enabling the time constraints. The workload of the data can be organized through the
management of latency period and predictive analytical approach. The OLAP technology is
used for modelling the time series regression analysis of the working model. The Rapid
miners are used for synchronizing the information on the distributed working environment.
The excel sheets are prepared for handling the low latency period. The decision tree is
prepared for calculating the response time of the query. The tactical and strategic plans are
prepared for optimizing the building blocks of data to retrieve the specific information from
the given data table. The plans are executed to store the information uniquely in the data table
by scaling it in horizontal and vertical perspectives. The relevancy of the business questions
can be optimized by processing the operations in storage units. The use cases of big data
tables are prepared for storing the value in structured format. The multi-dimensional view of
th tactical plan is prepared for sequencing the information in desired format. The information
is compatible to retrieve the information from the data acquisition and generalization layer.
The mapping techniques are implemented to develop the automatic charts of building blocks
to create the instances of the data. The information is organized in serial order to handle the
processing of information within the time frame. The latency period is used for accessing the
information within the generated response time. The unified processing is helpful in seeking
the information from heterogeneous environment to resolve the complexity of the data items.
The throughput of the batch can be optimized for developing the need of looping and
iteration to receive the desired output. The reference model is prepared to emphasize the
business intelligence techniques to synchronise the filtering of the processes. The expectancy
90

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Big Data Modelling
of the data volume should be judged proactively so that the association of the processing
engine can be effectively done. The read and write operations can be emphasized on the
scaling tools to increase the distributed processing of the information. The time step program
is developed to evaluate the gap between the data sets for synchronizing the operational
process. The real time information of the inventory can be measured by calculating the
operations on the XML and excel sheet. The dimension of the big data volume should be
proactively measured to increase the association of massive parallel processing program. The
CPU utilization can be increased by dividing the data into smaller fragments of memory. The
large volume of data can be easily manageable on the distributed environment. The structured
view of the database model helps in retrieving the information from the batch processing unit.
The iteration blocks are created for increasing the functioning of the machine learning
algorithm. The map reduce technology helps in increasing the document preparation support
on the underlying model of the data unit. The time series classification of the data can
increased the association of the system architecture. The filtering and cleansing operations
are performed to optimize the reduction of the multi-dimensional data blocks. The linear
regression methods are used for developing the decision tree to optimize the framework of
the data execution methods. The compatibility checks helps in increasing the accuracy and
confidentiality of the data and information stored in the warehouse. The system is organized
according to the suitability of the technology mix architecture. The analytical data blocks are
prepared for increasing the persistence of input and output of request and response model.
The storage capability of the database can be amplified by demonstrating the real time
execution of the information. The processing of the batches is done through the data stored in
the heterogeneous environment.
The developement and designing of the architecture depends on the composition of the
application layer, computing layer, and infrastructure layer. The infrastructure layer is
organized for managing the synchronization of devices and hardware. The service level
agreement is signed for defining the pool of resources to assimilate the authorization of the
resources and devices over the cloud computing environment. The operational efficiency and
processing functions can be optimized. The abstraction model is prepared for creating the
logical functioning between the batch processes. The visualization of the responses can be
effectively manageable by the application layer. The reordering procedures of the inventory
helps in minimizing the repository of the data nodes on preserving the consistency of the
information on different parameters of the servers. The stock management program is
91

Big Data Modelling
anticipated for developing the conceptual model to sense the data available on the different
data servers. The three tier architecture is developed for managing the request of the users to
fetch corresponding information from different data stores. The focus should be given on
accessing the frequency of the data retrieval process from heterogeneous platform of data
stores. The size of the data volume should be pre-processed and evaluated for resolving the
complexity of the data volume. The fragmentation of the information should be done in
managing the large data chunks of information. The distribution of the data types should be
evaluated according to the categorisation of the data sets. The size of the queue should be
measured for scheduling the process orientation and operations to generate effective
responses. The read and write requests of the data model should be managed for retrieving
the desired information from the voluminous data. The selection of the hardware and
software should be done according to the nature and size of the heterogeneous data gathered
in the large chunks of memory. The geographical location of the data can be calculated by
preserving the data network policies. The fault tolerance capability can be improved y
developing the replicated copies of the data on the data server. The information can be
switched over the network according to the development of user control policies. The
interruption of information can be handled securely for systematic scheduling and isolation of
the information. The queue size of the network management program should be developed
for accessing the information to improve the performance of the data handling program. The
network control protocol should be developed for improving the access time of the data from
different data servers. The metrics of data is prepared for synchronizing the flow of data
packets over the network. The threshold value of the inventory control is calculated for
comparing the underflow and overflow condition. The critical stage of the data should be
measured for managing the complexity of the data driven program. The authorised and
authenticated control should be developed for improving the data transaction over the period
of time. The distribution of the data stores helps in resolving the parameters of request to
synthesize the data accurately. The nearest data server is chosen for retrieving the data to
fulfil the user request by generating effective response. The inventory management program
is developed for authorising the data sets of information. The specific information is retrieved
from the warehouse to get effective information for improving the satisfaction level of the
user. The change of request focuses on identifying the shortest path for retrieving and sending
the information to the desired location of the data network centre. The shifting of the data bits
helps in calculating the flow of network packets between the sender and receiver. The
managerial control on the working model of the data stores can be predicted by optimizing
92

Big Data Modelling
the volume of data packets. The synchronous growth of parametric value is used for
measuring and storing the inventory volume. The data retrieval can be done at faster rate by
replacing and optimizing the data sequentially to manage the integration between the data
units and data stores. The location of the resources should be measured for optimizing and
scaling the large data program to utilize the data packets effectively. The strategical approach
should be developed for scaling and controlling the large data volume at different data server.
The virtualization of the cloud models is used for scaling the data models for managing the
information at different location of the server. The real time information of the inventory
helps in improving the decision making capability of the working model of the processes and
procedures. The real time information can be foster in the hierarchical order for resolving the
balance of information on the different modules of large data sets. The efficiency of the
inventory control system can be optimized by placing the RFID reader in the smart devices
for fostering the business intelligence program in the working model of the inventory
management system. The continuous status can be drawn by tracking the authorized control
of inventory management. The monitoring program of the inventory control system should be
based on time based and real time synchronous approach. The information is useful for
managing the analytical approach of information to take effective decision in complex
situation. The integration of business intelligence program helps in developing effective
knowledge management program for fostering the operational efficiency of the decision
making to perform the data mining operation in consecutive manner. The real domain of the
information should be drawn for minimizing the latency period of time to retrieve the
appropriate information. The identification of the problems helps in managing the historical
data of the actual events which takes place in the working model of the warehouse. The
operations and processes helps in improving the assessment of decision to correlate the
predictive maintenance program for tracking the information flow. The multiple sources can
be fetched for optimizing the aggregation of the strategic functions and business processes.
The smart decision can be effectively taken by analysing the real time availability of
inventory to process to information consequently. The coordination of the activities helps in
formulating the multiple decisions to manage the inventory in the warehouse to fulfil the
demand of the user. The key indicators should be used for driving the functional program of
logistics to create the maintenance program. The cost of the inventory can be optimized by
managing the cohesive decision to increase the operational efficiency of the working model
of the business processes. The value can be added to gain the managerial support to increase
the linking of the data units. The information can be tracked by outsourcing of the data at the
93

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Big Data Modelling
data server. The service level agreements are developed for improving the tracking of the
business processes. The delivery of products and commodity should be forecasted for
managing the availability of the inventory in the warehouse. The price of the commodity
should be evaluated by selecting the supplier from multiple locations. The planning of the
inventory can be effectively done for increasing the production policies. The logistics
program should be developed for planning the location of inventory in the warehouse. The
sales of the warehouse should be forecasted for optimizing the performance of the real time
inventory management system and control. The intelligence system is developed to analyse
the location and arrival of the new product in the warehouse. The sensor data and information
is collected by the RFID reader placed in the smart devices. The availability of the
information is compared with the threshold value of the data items for sending the alert signal
for inventory control on the mobile phone of the user. The predictive modelling program
helps in analysing and forecasting the need and demand of the inventory by the user so that
the request and response simultaneously can be effectively manageable. The visibility of the
information can be highlighted to achieve the balance and integration between the attributes.
The uncertainty of the inventory availability should be tested so that the balance of stock can
be effectively maintained. The sequences of operations are performed to measure the
statistical approach of data handling program. The decision and action of the inventory
management program can be effectively optimized for elaborating the business environment.
Pre-Processing functions:
The diversification of the resources and data can be manageable by minimizing the noise and
data redundancy so that consistency and reliability of the information can be achieved. The
shortest route should be chosen for transferring the relevant and desired information from the
data server so that data quality can be improved in terms of data privacy and security
(Bjorgvinsson, 2010). Some of the pre-processing functions are discussed below:
Data Integration function: The data integration functions are used for combining the data
from various formats and sources to give a unique view to th data visualization. The ETL
method is implemented for extracting the information from the data warehouse. The ETL
stands for extraction, transformation, and loading function. The processing of the data
depends on the query posted by the user. The analytical approach is implemented for
aggregating the relevant information from different format. The performance of the data
retrieval can be optimized by the association of the batch processing and stream processing
94

Big Data Modelling
system. The data can be easily searched in the stream of data. The diagram below shows the
processing functions which are used for managing the big data in an optimized format:
(Source: Made by author)
Data cleansing function: The inaccuracy of the data can be minimised by applying the data
cleansing function. The errors can be minimised by correcting them. The modification and
manipulation of the errors can be done. The cleansing operations are performed to reduce the
redundancy. The tracking of the inventory can be done by using the RFID reader. The
checking of the anomalies can be manipulating the integrity constraints. The correction of the
errors and instances should be periodically carried out so that chance of anomalies occurrence
can be minimized. The quality of the responses can be improved by reducing the anomalies in
the data packets (Alam and Shakil, 2016). The cleansing process is comprised of various
checks which focuses on formatting testing, complete testing, reasoning testing, and limit
testing. The consistency of the data retrieving in different domains is used for managing the
configuration process systematically. The RFID readers are used for reading the data and
cleaning the files according to the user requirement. The distraction in the file due to the
occurrence of noise can be minimized by reducing the presence of anomalies in the data
95

Big Data Modelling
transmission and storage process. The organization of the data packets should be done by
managing the integrity constraints on the files and data packages. The error detection and
recovery processes are used for eliminating the redundant data from the data packages. The
overloading and delay of the operational process can be minimized. The data sets are
prepared which increases the redundancy of the data on different data servers (Reddy and
Priya, 2018). The cleansing function is used for managing the relationship between the
different data sets for reducing the memory space occupied by similar information. The
accuracy in the data retrieval process can be achieved by deleting the repeated information.
The time duration for completing the activities can be minimized. The complex relationship
should be developed for achieving the accuracy in retrieving the required response with
respect to the query posted by the user.
The empty spaces of the memory can be minimized by removing the redundant information.
The compression methods are used for getting accurate information from the large data sets
of the heterogeneous environment. The statistical study of the data redundancy helps in
analysing the memory required for completing the portion of work in a synchronized format
(Borkar, and et.al., 2014). The sensor devices are placed for collecting sensor data of the
inventory in terms of audio and video files. The video data can be minimized through the
implementation of the data compression taken. The hashing methods are used for identifying
the information from the large data chunks. The storage space of the memory can be
minimized by applying the data compression techniques. The exploration of the background
data can be done for organizing and sequencing the data in structured format. The repeated
information is deleted from the large chunks of memory. The data is retrieved from the data
server. The pre-processing operations are performed for extracting the required data in
systematic format. The distributed data receives from heterogeneous format helps in
specifying the data sets. The mapping functions are installed for getting the right match of
information according to the query posted by the user. The extraction of the data value helps
in analysing the reliability and scalability of the information. The validation and verification
protocols are deployed for getting data quality. The processing operations are performed
for managing the multimedia data on the data server.
Direct Attached Storage: Direct attached storage is used for managing the storage devices for
systematically handling the host bus adapter system. The extension of the data server can be
effectively handled by keeping the storage devices in a sequential model. The structured
format of files are used for utilizing the memory space in the best possible manner. The
96

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Big Data Modelling
aggregation of the networking devices helps in developing the specified infrastructure. The
virtualization of the storage devices helps in improving the visualization of the stored data to
increase the reliability and performance of the system.
Storage Area Network: The chunks of blocks are arranged to manage the data servers in
specified formats for easy handling of storage devices. The accessibility of the data can be
easily retrieved from the large volume of data. The storage capability of the storage area
networks can be improved by developing the storage area networks. The systematic view of
the networking devices can be formed to get the specified information from different
locations. The 4V data analytical model is developed for increasing the scalability and
virtualization of the file storage system on the cloud model.
Data and Information Management Framework System: The processing of the operations and
functions can be improved by managing the data and information in sequential formats. The
three layer architecture view is developed for storing the data and information in specified
formats and programming models. The heterogeneous data collected from different formats is
arranged in sequential programming models. The map reduce technology is implemented for
improving the level of operational performance of the working model. The data compression
techniques are used for eliminating the redundant information from various formats and
platforms. The MongoDB is used for managing the data in large data blocks of similar
information so as to accelerate the data retrieval process. The Google File System (GFS) is
the open source distributed file system for keeping the data in specified formats. The fault
tolerance capability of the distributed file system can be improved by applying the read and
write operations. The performance of the data retrieval process can be improved by managing
the data files in the Hadoop distributed file system. The NoSQL database is used for
managing the complexities of the big data arranged in column database model and key value
pair model.
Key Value pair management model: The key value pair model is used for handling the
request of the client and searching the data value according to the index value of the database.
The clustering of servers helps in keeping the replicated copies of the data model in different
formats and heterogeneous data formats. The partitioning of the data is done in multiple
version of data and information. The hashing function is used for retrieving the information
according to the query posted by the user. The consistency of the data can be managed by
assembling the data in concurrent format. The key value is stored in the column format of the
97

Big Data Modelling
database for assembling the multi-dimensional view of the distributed data. The time-
stamping value is used for getting the accurate and specified information. The request and
response model is developed on the google file system for developing the big data table in
specified format. The documents of the data are easily manageable on the MongoDB. The
master node is prepared for performing all the writing operations on the database system so
that tracking of the information can be done on the basis of query response in the query
processor. The automation in the distributed data can be balanced by synchronizing the
replication of the data available stored in the data server. The transaction of the data can be
done through the MongoDB. The concurrency of the data can be managed by controlling the
data transaction from the replicated copies of data. The fault tolerance capability of the data
stores can be improved by dividing the data into partition and large chunks of similar
information. The Read and Write lock mechanism is used for handling the consistency of the
cloud infrastructure. The map reduce technology helps in eliminating the repeated
information from the data server. The scheduling of the read (), Write (), and update ()
functions are used to get aggregation of the files from the cloud environment. The stream
processing model is used for managing the route of the event triggered in the scenario. The
processing elements are used for developing the acyclic graph to manage the route. The
channelization of the data can be scheduled for streaming the flow of data between the vertex
and processing nodes. The real time retrieval of the information can be easily carried out by
handling the processes and request in the batch processing unit for aligning the flow of data
streams in sequential formats.
The involvement of the data analytical approach helps in extrapolating the interpretation of
the information as demanded by the user. The legitimacy of the data can be checked for
managing the data dependency. The decision making capability of the user can be increased
by aggregating the information from different formats and creating the heterogeneous
environment. The fault tolerance capability can be predicted by diagnosing the reason of
failure. The statistical view of the data should be developed for managing the diversification
of data in different formats. The qualitative and quantitative data should be managed for
summarizing the data into a single format. The data mining operations are performed for
extracting the relevant information from the data warehouse. The data analysis process is
carried out in three levels which are named as description analytical approach, prediction
analytical approach, and prescription analytical approach.
98

Big Data Modelling
Description analytical approach: This approach is used for managing the historical data in big
table format. The data sets are prepared by managing the logical regression methods to
extract meaningful information and getting visualization of the data formats. The business
intelligence model are deployed for increasing the data visibility in simplified formats.
Prediction analytical approach: This model helps in predicting the probabilities in managing
the statistical approach of the data sets. The linear and logical regression techniques are used
for forecasting the trends and data mining operations on the data warehouse for retrieving the
updated information.
Prescription analytical approach: The decision making capabilities can be improved by
managing the flow of information to address the data complexities. The optimal solutions can
be presented to minimize the complexities of the data and information stored in the large
chunks of data sets.
The complexities of the Hadoop distributed file system can be improved by synchronizing the
flow of information, managing block operations on data, Optimizing the flow of input and
output data, scheduling of processes, presenting Join operations on the data stored in different
data blocks, optimization of the HDFS performance, and Optimizing energy utilization.
The HaLoop algorithm is used for managing the flow of information in systematic format to
indulge the map-reduce technique for increasing the execution of the files and information
stored in the data chunks. The flexibility in retrieving the information can be achieved by
designing the loop algorithm programs. The block operators are used for synchronizing the
map reduce function for blocking the job to finish it in prescribed time and space. The
abstraction layer is formed for optimizing the performance of the distributed engine. The hash
tables are prepared for pre-processing the read and write operations to get the updated
information by searching through the hash index value. The incremental approach is
implemented for processing the data tables in specified formats. The data compression
techniques are implemented for reducing the Input output cost on managing the data in the
Hadoop file system. The Database indexes are prepared for searching the data in sequential
format. The Hadoop scheduler arranges the data in heuristic order for execution the given
tasks in systematic order. The end to end scheduling of the processes is done for optimizing
the management of resources in dynamic environment. The Join operations are performed for
joining the information to get a single unit of data. The functions are classified into two
categories which are named as Map side function and reduce side function. The map side
99

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Big Data Modelling
function helps in mapping the relevant information in specified format. The reduce side
function is used for reducing the redundant data from the memory to free the memory spaces.
The configuration of the application can be done for optimizing map reduce function. The
operations on the Hadoop node can be controlled by managing the replicas of data on
different job queue so that processing time of the operations can be minimized. The
complexity of the system can be reduced by aggregating the functions and processes in single
unit for increasing the performance of the database. The performance of the data sets can be
optimised by sequencing the real time data in data chunks. The benchmarking technique is
used for finding the normal threshold value of the proposed system so that underflow and
overflow condition of the commodity can be tested. The switching operations are performed
for getting the relevant information from the data model.
100

Big Data Modelling
Chapter 5: Implementation and Results
The implementation of the big data model proposed architecture for managing the inventory
of the organization helps the user to take proactive action for managing and synchronizing the
availability of the commodity. The danger to the availability of the inventory can be predicted
by comparing the present availability with the threshold normal value. The Spark streaming
process is used for getting the required and updated information of the data from the data
warehouse. The MongoDB is used for reading the information of the inventory from the
batch processing unit. The notification and alert signals are sent to the user for taking
proactive action plan to keep the balance between the availability of the commodity
according to the demand placed by the user. The real time information collected from
different real time analysis devices is stored in the Spark streaming process (Prasad, and
et.al., 2014). The spark streaming process sends the required information to the MongoDB
for sending the alert signal on the mobile devices of the user in case of emergency situation
of underflow and overflow availability of the commodity. The batch processing unit work
after a span of time to process for the next query send by the user. Spark streaming process
maintains the real time information about the availability of the commodity to check it and
map it with the threshold normal value. The abnormal situation of the data availability can be
checked by analysing the information stored in the MongoDB which is used for sending the
real time information. The classification of the data is done by implementing the logistic
regression model. The implementation process is comprised on sequence of steps which
includes the data collected from the sensor devices placed in the warehouse to keep track of
the real time information. The data streams are created which are stored in the spark
streaming process for filtering the required data (Kumar, and et.al., 2016). The logistic
regression model is applied to the output of the spark streaming process for filtering and
capturing required data from the voluminous stream of data. The abnormal situation of the
data availability for predicting the underflow and overflow situation of the inventory is
detected by applying the logistic regression model. The output of the logistic regression
model is stored in the MongoDB for sending alert signal to the mobile phones of the user so
that proactive action plan can be taken for balancing the inventory control management
program. The proposed architecture is based on inventory management system for managing
the voluminous data of the organization in relation to the inventory. The focus should be
given on managing the updated information of the available inventory to send alert signal to
the mobile phone of the user (Biswas, and Sen, 2016). The data stores are created for specific
information which accelerates the retrieval of the information from various sources. The
101

Big Data Modelling
critical data of the inventory should be specifically judged under emergency situation. The
frequent step should be taken for managing the balance of inventory in the warehouse. The
reordering of the available inventory is an important step to manage the freshness and
advancement of the product. The data should be organized by sensing the present condition
of the available data in the warehouse. The focus should be given on minimizing the
replication of the commodity by managing the particular blocks of information. The
balancing of the data can be done by active management of the storage server. The
information is optimised by maintaining the user request and checking the availability of the
order in the warehouse to increase the level of customers satisfaction (Kumar, and
Chaturvedi, 2017).
Some applications of the proposed architecture for managing the voluminous data of the
supply chain management system are given below:
Marketing: It helps in analysing the marketing trends and technologies used for managing the
customer demand in an effective manner. The data sets of the products are prepared
according to the demand placed by the customers. The data sources are integrated for
applying mapping function to get accurate result of the data mining operations. The Logistic
regression model helps in getting the relevant information from the huge amount of
information.
Procurement process: The information related to the negotiation process of the suppliers is
collected in the separate data blocks for getting the accuracy in the financial information of
the inventory control management system. The service level agreement is drawn for
improving the relationship with the supplier for managing the inventory supply to meet the
rising requirement of the customers. The alert signals are sent to the supplier for managing
the demand and supply of the inventory when the underflow and overflow condition of the
commodity occurs in the data warehouse.
Warehouse management program: The sensor devices are placed in the warehouse for
tracking the real time availability of the commodity to perform relevant mining operations.
The sensor data is collected in the spark database for undergoing the filtering process. The
data is captured from multiple sensing devices. The aggregate functions are used for
monitoring the assets available in the warehouse.
102

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Big Data Modelling
Transportation mechanism: The real time monitoring and tracking of the shipment for
managing the inventory in the warehouse can be optimised by finding the path by real time
monitoring. The GPS controllers are used for finding the location and weather condition of
the shipment. The variation in the delivery time of the product in the warehouse can be
effectively predicted before their occurrence which helps in taking proactive action in
emergency situation to overcome the condition of overflow and underflow. The spatial
regression modelling is used finding the traffic density for manging the supply of inventory to
the warehouse.
The protocols which can be used for securing the data in the proposed big data management
system architecture are named as Federal access and delivery infrastructure (FADI),
extensible access control mark-up language (XACML). The secured and reliable information
will be available on user request. It helps in developing trust on the proposed model. The
third party will not be able to access the information without getting the authorisation and
authentication from the user. The functionality of the infrastructure component is securely
defined (Hashem, and Ranc, 2016). The algorithm focuses on managing the cloud computing
interaction for securing the privacy of the information. The data can be accessed by providing
the domain of identity providers for providing the authorization and authentication of
managing the service agent. Access control policies are developed for enabling the control
mechanism for implementing the service gateways. The data centric access control protocol
is used for managing the privacy and sensitivity of the data. The document can be accessed
through the extensible access control mark-up language. The peer to peer communication
can be manageable by implementing the service gateways. The XACML protocol is used for
managing the confidentiality of the data structure used for managing the complexities of the
documentation and information stored in the database. The NoSQL database is implemented
for preserving the data in row-wise and column-wise format. The organization of the policies
helps in minimizing the overall computational time and cost. The data can be securely
accessed from the remote location. The encryption policies are developed for protecting the
data in a structured format.
Map Reduce Technology
The map reduce technology is used for managing the large data sets in the smaller chunks of
data blocks for retrieving the information in the user defined format. The processing of the
data blocks is done by using the map function () and reduce function (). The hierarchical
103

Big Data Modelling
structure of the information and data is prepared for performing the mapping function and
accelerating the data retrieval processing time. The master-slave framework is prepared for
managing the flow of information between the master and slave data nodes. The data load is
equally divided into several data blocks for scheduling the information and resources.
Components of Map Reduce Technology:
Name Node: The metadata is managed by the files in the Hadoop cluster environment. The
file can be directly accessed by the user through the name node searching format. The
information and data is stored in the replicated format.
Job Tracker: The job tracker is implemented for tracking the execution and processing of the
data components according to the planned scheduling architecture. The operations and
functions are divided by the tasks trackers for accelerating the processing of the data
components. The job and work can be reduced by minimizing the word count and memory
space by implementing the data compression techniques. The data can be fetched and tracked
through the job trackers for getting relevant information. The scheduling of the job is done by
monitoring the process and flow of operations in the desired format. The input value of the
keys are summed to get desired output value. The map reduce technology helps in finding the
location of the data in the compressed format. The mapping of the files is done on the
physical data stored at the data server. The processes are parallel operated for performing the
data operations in optimized format. The processes are completed within the allocated time. It
helps in increasing the fault tolerance capability of the processes and function used for
managing the file handling mechanism. In case of failure occurrence, the job is re-executed
for getting the specified output. The completion of the file is done within the memory
occupied at physical storage. The balancing of load in the memory chunks can be effectively
done through the map reduce function and fault tolerance capability the pre-processing
function helps in managing the data of the multiple nodes. The open source computation
helps in increasing the scalability of the data nodes from the physical data server. The
computation from the multiple nodes can be increased through the parallel computing
paradigm for getting automation in the scaling process. The map and reduce functions are
used for managing the processing of the key value pair for generating the intermediate results
of processing. The model is designed for providing access control to the end user for
retrieving the relevant information according to the query posted on the portal
instantaneously. The information is access frequently for managing the concurrency control
104

Big Data Modelling
on the data units. The aggregated functions are performed for storing the data sequentially on
the Hadoop file structure. The information is transmitted for accessing the control on the data
blocks. The data can be directly accessed from the data structure. It helps in accelerating the
business functions and data units.
There are three major components which are used for retrieving the desired information such
as mapper, reducer, and master. The master node is used for scheduling the job between the
data node for performing desired operations. The tasks are scheduled according to the
distributed file management system. The map reduce technology are used for mapping the
function with the help of parsing function by maintaining the remote procedure call for
carrying out desired operation and function. The data is prepared in sorted order for
generating the key value pair to define the map and reduce function. The fault tolerance
capability helps in increasing the scalability of the distributed environment. The reducers are
responsible for generating the remote procedure call for reading the data from the file and
large data block. The diagram given below shows the assemblage of mapper and reducer
function on the sorted data.
Mapper Function:
The mapping phase is comprised of the data nodes for taking the input in specified format
and dividing the data into smaller chunks of memory. The effective management of data
integration takes place between the master nodes and slave nodes for passing the required
data to the specified location. The tree structure format is developed for dividing the work
105

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Big Data Modelling
nodes in systematic structure. The mapper function is used for sending the answer of the
search function to the master node.
Reducer Function:
The sub data files are collected from heterogeneous environment to develop the worker node.
The output of the answer is visualized on the visualization layer. The key performance
indicators are used for associating the key value for searching the index value of the
processor. The Map Code () is created for organizing the range of key values to get desired
result of the search. The shuffling operation is performed for reducing the workload of the
processor for getting the relevant result of the search. The sorted data is collected as the result
of the mapping function. The data is organized in the sorted order for reducing the work load
on the intermediate member function. The reduce function is applied for reducing the output
data in the synchronised format. The execution of the data processes helps in balancing the
failure of the data nodes in mapping the function by finding the checkpoint location of the
data format.
The re-execution of the failure nodes helps in increasing the operational capability of the
processes and procedures. The global file system is used for storing the data in sequential
format. It helps in reducing the probability of the data retrieval process. The memory buffer is
created for handling the metadata to get and retrieve information from the heterogeneous
environment. The map reduce technique is used for dividing the information into various
partition. The size of the memory chunks can be varied between 16 MB to 64 MB. The
mapping of the task is done through the execution of work. The temporary pairing of key
value is generated for assigning the tasks to the master node. The remote procedure call helps
in reducing the fault tolerance capability of the master node. The pairing of the data is done
through the partition of data in individual format. The parallel processing tasks are
implemented for splitting the files in small partition of data format. The mapping of the
processes is done through the key value analysis of the data format. The shuffling phase is
used generating the key value pair in corresponded to the reduce function. The Map reduce
function is used for collecting the jobs together in the single unit for getting organized output.
It helps in balancing the flow of information and data for organizing the data and information
in distributed file format. The schema is flexibly designed for explorating the map reduce
function applied on the database schema. The scalability of the data model can be increased
106

Big Data Modelling
by synchronizing the files in the scheduled format. The job tracker is used for finding the new
jobs for manging the memory spaces in simplified format. The job status function defines the
processing operation performed on the real time environment on the data stored in the
database. The alert signal of the inventory status is transmitted to the user for providing him
the relevant information about the inventory control and management program. The new data
sets are created for increasing the synchronising in getting responses to the query generated
by the user. The throughput accessing of the output data can be optimized by scheduling the
tasks performed through the task tracker. The data blocks are divided into 64MB data nodes
for retrieving the required information from the large voluminous data. The cluster of
information and data is used for mapping the through the key value index pair. The splitting
of the data node is integrated through the mapping function. The master location are searched
for reducing the workload of searching on various data nodes. The location of the desired
data can be searched by pairing the buffered location of the memory chunks of accelerating
the searching process of the data nodes. The data is organized in the sorted order for reducing
the work load on the intermediate member function. The reduce function is applied for
reducing the output data in the synchronised format. The execution of the data processes
helps in balancing the failure of the data nodes in mapping the function by finding the
checkpoint location of the data format. The desired location can be fetched for storing the
data in temporary location. The map reduce function is used for organizing the files in the
hierarchical data structure. The execution process helps in reducing the data format for
managing the files in sequential structure. The alert signals are generated to find the
behaviour and availability of the inventory in the warehouse management system. The
symbols and semantics program are used for developing the non-deterministic model.
The expansion of the HDFS system can be done by implementing the File manager and Map
reduce technique for managing the data files in systematic format. The File manager is
appointed for seeking information from the small chunks of files arranged in aggregated
formats.
File Management System: the execution of the file management system depends on retrieving
the data through the name node searching model. The distribution of the files is done through
the management of mutation property. There are four major functions which are used for
retrieving the relevant information from the database.
107

Big Data Modelling
File Integrator Function (): The file integrator function is the innovative method used for
defining the functioning of the file manager for getting relevant information from the HDFS
block size. The data should be arranged in the sequence of index. The index of the file is
comprised of ID number and the name node with the file name description which makes the
searching of the data easy and compatible. The files are arranged in distributed format to
retrieve the files from multiple data server. The following table shows the algorithm of file
integrator function for managing the file in sequential format.
Steps File Integrator Function
Step 1 Initiation of the file integrator function for managing the data in
sequential format
Step 2 Initiation of the File Counter fc++ for creating the new file
Step 3 Initiation of the Read () function for reading the data stored in
the current file n of the Index
Step 4 Writing the data of File n to the File m
Step 5 End of the file integrator function
The diagram given below shows the flowchart for the file integration system
108
Start
Initiation of the file integrator function for managing the data in
sequential format
Initiation of the File Counter fc++ for creating the new file
Initiation of the Read () function for reading the data stored in
the current file n of the Index
Writing the data of File n to the File m
Stop

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Big Data Modelling
File Read () Operation: The file read function is used for reading the content of the current
file from the Hadoop cluster of files. The data can be read from command line argument. The
File name is entered by the user for searching the file in Hadoop cluster file. The mapping of
the data can be done for reading the information from the current file. The data block is
displayed for visualizing the data and information on the visualization layer.
Steps File Read () Operations
Step 1 Initiation of the File Read () Operation
Step 2 User Enter the file name to be read of their choice on
command line
Step 3 Searching of the File with the help of File integrator function
Step 4 Mapping of the file name in the Hadoop cluster
Step 5 Searched result for the Hadoop distributed block
Step 6 Displaying the result of the read function by displaying the
content of the file
Step 7 File Not found if file name entered by the user is not
available in the Hadoop cluster
Step 8 Completion of the File Read Function
The flowchart given below shows the sequence of steps takes place in performing the read ()
operations:
109
Start
Initiation of the File Read () Operation
User Enter the file name to be read of their choice on command line
Searching of the File with the help of File integrator function
Mapping of the file name in the Hadoop cluster

Big Data Modelling
File Modification () function: The File modification function is used for modifying the data
and information in the available file in the Hadoop cluster according to the choice of the user.
The command line argument are used for modifying the program stored in the HDFS. The
search operation helps in facilitating the integration of data from heterogeneous environment.
The data file is saved in the HDFS cluster again for future use.
Steps File Modification function
Step 1 Initiation of the file modification function
Step 2 Search for the file name whose data item has to be modified
Step 3 Implementation of the file integration function for searching
the file from the warehouse
Step 4 Mapping of the file name according to the block ID in the
HDFS environment
110
Searched result for the Hadoop distributed block
Displaying the result of the read function by displaying the content
of the file
Reading the content of file searched
File Not found if file name entered by the user is not available in the
Hadoop cluster
Stop

Big Data Modelling
Step 5 Reading of the file content searched in the HDFS cluster
Step 6 Visualization of the data content
Step 7 Modification of the file on user demand
Step 8 Saving of the file according to the user demand
Step 9 Display the message “File not found” if unavailable in the
warehouse
Step 10 Completion of the Read and Modify function
The read and write operations are performed by applying the procedures of late binding
program to improve the mapping of the data from various operational stores of information.
The repositories of data are developed to fulfil the access control need of data. The logical
infrastructure of the Hadoop file system is developed by combining the downstream of data
to achieve the accuracy and trust of information. The schema designs should be developed for
managing the operational view of the business processes. The latency period for obtaining the
response of the query can be minimized to improves the consistency of the data units. The
flowchart given below shows the sequence of steps takes place in performing the Modify ()
operations:
111
Start
Initiation of the file modification function
Search for the file name whose data item has to be modified
Implementation of the file integration function for searching the file from the
warehouse
Mapping of the file name according to the block ID in the HDFS environment
Reading of the file content searched in the HDFS cluster

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Big Data Modelling
File Delete Operation: File Manager is focuses on reading the file from the warehouse. The
corrupted files of the warehouse can be eliminated from the HDFS on the user demand. The
Delete program helps in mapping the block ID for deleting the information stored in the
block. The file integration data format is used for searching the file in the warehouse.
Steps File Delete function
Step 1 Initiation of the file delete function
Step 2 Search for the file by applying the file integration function
Step 3 Finding of the user demanded file
Step 4 Mapping of the file according to the block ID of the Hadoop
file cluster
Step 5 Finding of the file in HDFS
Step 6 Displaying the content of the file
Step 7 Providing authority to the user for deleting the content of the
file
Step 8 Display the message “File not found” if unavailable in the
112
Visualization of the data content
Modification of the file on user demand
Saving of the file according to the user demand
Display the message “File not found” if unavailable in the warehouse
Stop

Big Data Modelling
warehouse
Step 9 Completion of the Delete function
The flowchart given below shows the sequence of steps takes place in performing the Delete
() operations:
113
Start
Initiation of the file delete () function
Search for the file by applying the file integration function
Finding of the user demanded file
Mapping of the file according to the block ID of the Hadoop file cluster
Finding of the file in HDFS
Displaying the content of the file
Providing authority to the user for deleting the content of the file
Display the message “File not found” if unavailable in the warehouse
Completion of the Delete function
Stop

Big Data Modelling
Result
The Sensor acquisition layer is used for managing the objects and raw data and information
in a sequential format. The sensors and RFID readers are used for addressing the connection
between the different objects by using the IPv6 addresses. The main data source which
generates the heterogeneous data is sensor data of IOT application. The data of the retailers,
logistics, and distributors are arranged in a structured format so that the information retrieval
can be effectively done. The cloud infrastructure is suitable for managing the flow of
information from one data objects to others. The communication is done by using the data
bus for increasing the flow of information between various units. The optimization of the
processes can be done by data storage. The data analytical engine is the core function in the
data management process. The smart processing can be done through the extraction process.
The uncertainty of the business intelligent can be done through the prediction.
The Data visualization and rendering system is used for taking the effective decision. The
interoperation and robustness in the sub-system for managing the replicated data can be
effectively drawn for fetching the required data from the database. The job trackers are used
for appointing the data to the slave node. The location of the data nodes can be tracked by
placing the job tracker at appropriate place. The data is fetched according to the process
present in the queue of batch processing unit. The real time intelligence is helpful for
managing the data in the data store. The dimensional data store is used for presenting the real
time visualization. The data quality can be effectively balanced in the database. The security
and confidentiality of the data can be effectively managed. The granularity of the data can be
achieved by managing the data in row wise and column wise format. The user interface is
comprised with OLAP technology for enabling the query and generating alert signal to the
mobile phones of the user. The public and private keys are generated for authoring the data to
enable the payload of information. The communication can be originated by enabling the
message protocol between the processes. The reliability can be achieved by measuring the
parameters for fault tolerance. The optimization of the resources can be done by increasing
the efficiency and data utilization. The access controls are provided to system by encrypting
the information.
The table below shows the analysis of the big data characteristics in managing the big data of
the inventory control management system:
114

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Big Data Modelling
Types of
Characteristics
Supply of
Inventory
Manufacturing
process
Demand
Management
Sales and
Purchase
Volume The product
design is
comprised of
features and
relevant
information, size
of the inventory
required,
preparation of
the shipment,
delivery to the
required place
Analysis of the
customer
requirement,
management of
the demand,
managing the
process
throughput
cycle,
determining the
capability of the
inventory,
managing the
data required by
the client,
determining the
financial cost of
the inventory
Variation in the
demand
forecasting of
the customers,
management of
the schedule
delivery,
tracking of the
stock level,
pricing of the
stock, analysing
the customer
requirement and
feedback, and
promoting the
balance of
inventory
control.
Analysing the
variation of the
demand
forecasting, lead
time of the
product
feedback,
extracting the
features of the
product, analysis
of the
throughput time,
organization of
the pricing
payment,
analysis of the
inventory
available in the
warehouse,
managing the
demand by
purchasing new
and required
inventory
Variety Inclusion of
variety of
database,
organization of
the telephonic
conversation,
reading of the
RFID data,
Development of
the XML
document,
Organization of
the RFID reader,
inbuilt micro-
chips, web
cameras, and
Generation of
the physical and
XML document,
collection of the
sensor data in a
structured
format,
organization of
Collection of the
RFID data,
creation of the
sensor data,
organization of
the structured
database, and
systematic
115

Big Data Modelling
development of
the physical data
and document
other devices. the telephonic
web data, and
synchronization
of the customer
request
arrangement of
the telephonic
data.
Velocity Organization of
the inventory
information in
hourly basis,
Daily basis,
Weekly Basis,
and monthly
basis to identify
the trends of
underflow and
overflow
condition of the
inventory
available in the
data warehouse.
Organization of
the inventory
information in
hourly basis,
Daily basis,
Weekly Basis,
and monthly
basis to identify
the trends of
underflow and
overflow
condition of the
inventory
available in the
data warehouse.
Organization of
the inventory
information in
hourly basis,
Daily basis,
Weekly Basis,
and monthly
basis to identify
the trends of
underflow and
overflow
condition of the
inventory
available in the
data warehouse.
The focus
should be given
on collecting
real time
information
from various
sources
Organization of
the inventory
information in
hourly basis,
Daily basis,
Weekly Basis,
and monthly
basis to identify
the trends of
underflow and
overflow
condition of the
inventory
available in the
data warehouse.
Analysis of the
sensor data
which is
collected from
real time
processing
schedule
Value The details and
specification of
the new product
should be
arranged in the
systematic
format so as to
The size of the
inventory should
be optimized for
identifying the
product decision
support for
selecting the
The congestion
control
mechanism
should be placed
for systematic
scheduling of
the packet over
The
implementation
of the predictive
demand
analytical model
is used for
designing the
116

Big Data Modelling
manage the flow
of information
in according to
the query posted
by the user
required supplier
for taking
effective routing
decision.
the network
(Bhosale, and
Gadekar, 2014).
The focus
should be given
on analysing the
customer
requirement and
quality of the
product.
customer
requirement and
preparing
process queue.
The clustering
of process is
used for
developing the
batching system
for defining the
stream of data to
accelerate the
data retrieval
process.
Veracity Organization of
the
heterogeneous
data from
multiple
platforms,
managing the
reliability of the
network
congestion,
systematic
management of
the noise
generation
Organization of
the
heterogeneous
data in different
formats.
Organization of
the
heterogeneous
data from
multiple
platforms,
managing the
reliability of the
network
congestion,
systematic
management of
the noise
generation
Organization of
the
heterogeneous
data from
multiple
platforms,
managing the
reliability of the
network
congestion,
systematic
management of
the noise
generation
The success of the data model depends upon the availability of the required data asked by the
user. It helps in managing the changes in the inventory control system. The privilege should
be given to the data security and privacy so that data can be preserved from the third party
attacks. The proposed system helps in managing the large volumes of data in structured
117

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Big Data Modelling
format so as to speed the data retrieval process. The organization of the data is done in multi-
time distributed file system for the organization of the replicated copies of data on different
server. The processing of the data can be done in highly secured environment. The major
focus is given on managing the data integrity and reliability with respect to the inventory
available in the data warehouse. The graph below shows the statistical view of available
inventory by implementing the proposed big data architecture.
(Source: Made by author)
The proposed architecture provides the financial status of the inventory available in the
warehouse in statistical format which can be depicted from the figure given below:
118

Big Data Modelling
(Source: Made by Author)
The applicability of the proposed architecture can be compared by finding the time taken to
search the file in the local database. The processing time and computation speed should be
calculated for measuring the efficiency of the proposed database system. The sequencing of
the files can be optimized by developing the Hadoop clusters. 4.2 GB Ram is allocated to
each data node for storing the data in it. The document is prepared in XML format. The
interaction between the data nodes can be improved by implementing the command interface
between them. The cluster status can be checked by integrating the balance between the data
nodes and name nodes. The accessing of the data from the file system can be retrieved by
applying the streaming process. The faults in the data and information can be identified by
implementing the Hadoop file structure for storing the big data of the applications.
Overview of the workload management system: 4000 files can be systematically
synchronized in the data compression format. The data and files are stored in the semi-
structured and structured format. The compression techniques are used for presenting the files
in the systematic manner (Dean, and et.al, 2014).
Performance: The performance of the proposed architecture can be measured in terms of time
taken for managing the data in the Hadoop file system, managing the memory for storing the
data in the name node and data server. The efficiency of the data depends on time taken for
completing the data operation in pre-specified time. The overall estimation of memory space
and time helps in finding the efficiency of the complete system.
Load Balancing of the data: The data load can be balanced on the HDFS system by
developing the small chunks of file of the large data blocks. The time required for completing
the process of load balancing is the sum of conversion of small files to large files and moving
of the file to the HDFS cluster (Elsayed, and et. al., 2014).
Memory management system: The proposed model is used for comparing the name node
with the memory management system for finding the difference between them in terms of
memory used for storing the data and information in the name node (McCreadie, Macdonald,
and Ounis, 2011). The computation speed of the memory management system depends on the
memory occupied by particular name node, total number of cluster blocks, metadata size, and
Index size. The estimated 4000 files can be stored in the single data block. The file integrator
function is used for integrating the files in a single data blocks.
119

Big Data Modelling
Processing Time: The processing time for finding the data file depends on the time required
for fetching the data packets. The sequential management of file helps in optimizing the
retrieval of the information in an efficient manner. From the research it has been find out the
Hadoop distributed file system takes 2.6 seconds for handling the retrieval of files (Saha, and
et.al., 2014). The organization of the sequential file takes 1.6 seconds for managing the data
files. Hadoop archive file takes 0.98 seconds. The proposed architecture retrieves the
information within 0.51 seconds.
Hadoop distributed file system is the most popularly used framework for managing the large
data blocks. It helps in handling the files from distributed environment. The performance of
the system depends on retrieving the files within the fraction of time. The processing of the
files helps in optimizing the sequential approach of the data in the structured format. The
small files are arranged in systematic order by implementing the map reduce framework
(Landset, and et.al., 2015). The file manager is used for handling and managing the file
according to the name node of the index. From the research, it has been found that the
proposed system is effective in delivering the required files of the users in minimum time.
The capability of the data storage is also optimised for handling large amount of data in
systematic format. The proposed architecture consumes very less memory for performing
their function and operations and stores the meta data in the large data chunks. The
performance of the system is measured in time taken for retrieving the data from
heterogeneous data. The small memory chunks are used for storing the metadata in the name
node in sequential format. It helps in minimizing the data retrieval and processing time to
increase and optimize the overall performance of the system. The focus is given on analysing
the time parameter for measuring the time required to load the data in the memory for
performing reading, writing, and modifying operations. The large blocks of data are divided
into small files and blocks which helps in accelerating the time required to load the files in
the memory for the process of data retrieval and management. From the research, it has been
found that the 10 GB files uploaded by the Hadoop file system in around 460 seconds,
sequential file system in around 590 seconds, Hadoop archives file system requires 510
seconds, and proposed architecture takes only 300 seconds for loading the data in the
memory by dividing the large data blocks in smaller files. The file is searched on the basis of
name node for managing the Index of file. The file integration function is used for searching
the file in the warehouse to retrieve the file within the minimum period of time and memory
120

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Big Data Modelling
space. The read operation is used for reading the content of the file for performing desired
function. The file modification function is used for modifying the desired information
according to the user demand after reading the data content of the file and save the file under
same name node in the updated format. The delete operation is performed for deleting the
information from the memory permanently. The inclusion of map reduce function and data
compression technique helps in optimizing the memory space occupied by the files and
information (Polato, and et.al, 2014). The mapping process helps in accelerating the file
integration function for searching the desired file under the name node index.
The visualization of the data sets helps in managing the data in sequential formats. The
information can be manageable for getting the real time situation of the inventory. The
efficiency of the system depends upon optimizing the big data route to get sensor data from
the sensor devices. The Block ID is automated generated for sequencing the data packets in
increasing order. The RFID readers are located for collecting the sensor data from various
sources to identify the inventory available in the warehouse. The physical and logical flow of
the data helps in retrieving the global information of the sensor devices. It helps in managing
the complexity of the inventory control system (Demchenko, and et. al., 2014).
The NoSQL database is used for storing the data from the multiple stores and data centres to
synchronise the data according to the requirement of the user demand and generating quick
responses. The characteristics of the big data such as variety, veracity, volume, and value can
be easily arranged in the layer reference model. The functionality of the processes can be
increased by applying the read, write, and aggregated operations on the data units. The user
defined functions are developed by aggregating the regular expressions for the database to
enhance the functionality of the data units stored at multiple locations of the data servers. The
distributed file system is developed for applying the map reduce technology on the real time
batches of processes for minimizing the latency period of time. The interactive queries posted
by the user can be processed with the instant responses to manage the workload on the data
server. The OLAP technology is used for implementing the predictive and prescriptive
analytical tools to initialize the regression formulation on the data program. The Hadoop
clusters are developed at the distributed server for generating instant responses of the query.
The re-engineering parameters are applied for scheduling the data analytical program to
retrieve the authenticated and authorised information from the heterogeneous data
environment. The scalability of the data server can be increased by initializing the online and
offline excel sheets and XML documents (Dittrich, and Ruiz, 2015). The unified data can be
121

Big Data Modelling
collected from the parallel processing of query queue to manage the results of the stream
processing unit. The validity of the solution depends on the authorization and authenticated
protocols used for preserving the data from the multiple channels of heterogeneous
environment. The smart applications is synchroinsed by transforming the data with the use of
extraction, loading, and transformation functions on the operational unit of the data items.
The strategy is developed to increase the capabilities and interdependencies of the metadata
to resolve the complexity of the data items. The decision tree is organised to enhance the
decision making process. The data is delivered on time within the time line framework of real
time architecture proposed for managing the large volume of data and information. The use
case are arranged to analyse the gaps and obscure in the proposed architecture for arranging
the big data in the modular structure format. The replicas of the data are presented at different
data sets on the server location (Lee, and et.al, 2018). The location of the data items can be
tracked for increasing the storage and processing of the information. The new iteration is
scheduled for retrieving the desired result of the query according to the demand raised of the
user. The multi-storage building blocks are sequenced to present the data in the structured
format and tracking of the information can be done through the index list arranged for the
metadata and intermediate results of the responses. The information can be fetched through
the data storage and acquisition layer. The operations and functions are decomposed into
smaller chunks of information for increasing the tangibility of the information. The solution
can be optimized by integrating the intermediate results of the processes and operation
performed to generate the required query responses of the user. The data arrivals can be
measured by initializing the smart devices to identify the real time processing of the
information. The complexity of the data management program can be minimized by reducing
the latency period. The clustering approaches are combined with the filtering functions for
removing the redundant information from the data server (Aher, and Kulkarni, 2015). The
distributed processing on the local memory helps in handling the scaling of the read and write
operations to amend the information organized in the high volume of large data sets. The data
sets are evaluated by approaching the spreadsheet to store the heterogeneous data under a
single category. The CPU utilization and memory management helps in retrieving the
information at much faster rate. The NoSQL database helps in dividing the large chunks of
data into smaller data sets. The processing and data extraction speed can be accelerated by
developing the multi-structured format on the Hadoop distributed file system. The
information can be recognized by initializing the index hash value for distinctly identifying
the data file on the data server.
122

Big Data Modelling
Chapter 6: Conclusion
It can be concluded that the proposed model for managing the big data of inventory control
system is successful in getting the real time report of the commodity availability in the
warehouse. The user is alarmed with the alert signal sent to the mobile phones for informing
about the underflow and overflow condition of the data. The distributed environment is
developed for optimizing the sharing of the object in performing parallel processing. From
the research, we have found that Storage area network and Network attached storage stores
the data in the data stores. The data mining technique is based on nearest server for finding
and extracting related information about the inventory of the organization. The inventory
control system should be developed for managing the commodity in an ordered manner. The
inventory can be effectively accessed by the user through the e-commerce trading
mechanism. The business intelligence is used for increasing the efficacy of the general model
123

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Big Data Modelling
of the supply chain management program. The real time operations and functions are
performed for handling the decision making capability by using the data mining operations.
The visualization of the data can be performed through various technologies such as
Dygraphs, Zing Charts, Polymaps, Timeline, Exhibit, Modest Maps, Leaflet, Visual.ly,
Visualize Free, jQuery Visualize, JqPlot, Many Eyes, Jpgraph, High charts, and others. In this
big data processing model, we have combine the batch processing model and stream
processing model. In batch processing system, data sets are prepared for storing the data in a
particular data node. The filter function is used for filtering the required data from the large
data sets of inventory control information. The cleaning function is used for removing the
replicated data from the database. The selection and extraction of the inventory information
from the large data sets are two most important functions for handling the query placed by the
user effectively. The new innovative methods are developed by expanding the traditional
architectural view for managing the big data information in systematic format. The study of
the related work in the management cycle program of the big data management system opens
the door for the research in developing the expanded version of the inventory control system
by analysing the presence of the commodity in the warehouse.
The servers are arranged in loosely coupled manner for systematic transmission of data with
the clients. The files are arranged in the HDFS according to the name node and data node of
the documents. The map reduce technique can be accelerated by using the job tracking
system which helps in assigning the process to the relevant data nodes. The location and
address of the required data can be tracked by using the job tracker for keeping the balance of
process queue between the client and server. It helps in managing the communication through
master slave process. The file integrator function is the innovative method used for defining
the functioning of the file manager for getting relevant information from the HDFS block
size. The data should be arranged in the sequence of index. The index of the file is comprised
of ID number and the name node with the file name description which makes the searching of
the data easy and compatible. The execution of the jobs can take place by preparing the look
up table for extracting the relevant information from the database. The real time operations
can be performed by implementing the OLAP (Online analytical processing program)
technology. The mobile devices are placed to manage the connection between the
information retrieved from the job tracker. The Database indexes are prepared for searching
the data in sequential format. The Hadoop scheduler arranges the data in heuristic order for
execution the given tasks in systematic order. The end to end scheduling of the processes is
124

Big Data Modelling
done for optimizing the management of resources in dynamic environment. The sequential
management of file helps in optimizing the retrieval of the information in an efficient
manner. From the research it has been find out the Hadoop distributed file system takes 2.6
seconds for handling the retrieval of files. The computation speed of the memory
management system depends on the memory occupied by particular name node, total number
of cluster blocks, metadata size, and Index size. The estimated 4000 files can be stored in the
single data block. The file integrator function is used for integrating the files in a single data
blocks. The proposed system helps in managing the large volumes of data in structured
format so as to speed the data retrieval process. The organization of the data is done in multi-
time distributed file system for the organization of the replicated copies of data on different
server. The underpinning of the data governance model helps in developing the effective data
management model. The integrated sets of data are arranged for developing the data models.
The physical models are developed for shaping physical and structured storage of data items.
The processing of the data can be done in highly secured environment. The major focus is
given on managing the data integrity and reliability with respect to the inventory available in
the data warehouse. The applicability of the proposed architecture can be compared by
finding the time taken to search the file in the local database. The processing time and
computation speed should be calculated for measuring the efficiency of the proposed
database system. The sequencing of the files can be optimized by developing the Hadoop
clusters. 4.2 GB Ram is allocated to each data node for storing the data in it. The document is
prepared in XML format. The monitoring of the inventory management system can be done
by implementing the proliferation devices process. The required information related to the
inventory can be collected from the large volume of data. The future scope of the research
program is to develop the security protocol for minimizing the impact of vulnerabilities on
the information stored in the data warehouse. The research gap exists in finding the path and
process which can be used for preserving the accuracy and reliability of the information.
125

Big Data Modelling
References
Aboudi, N., and Benhlima, L. (2018). Big Data management for health care systems:
Architecture, requirements, and implementation. International conference on advances in
bioinformatics, 2018 (1), pp. 1-10. Available at:
https://www.researchgate.net/publication/325921328_Big_Data_Management_for_Healthcar
e_Systems_Architecture_Requirements_and_Implementation [Accessed 31 Jan. 2019].
Acharja, D., and Kauser, P. (2016). A survey on big data analytics: Challenges, Open,
Research Issues, and tools. International journal of advanced computer science and
applications, 7(2). Available at: http://thesai.org/Downloads/Volume7No2/Paper_67-
A_Survey_on_Big_Data_Analytics_Challenges.pdf [Accessed 31 Jan. 2019].
Adam, K., Fakharaldien, M., Zain, J., and Majid. M. (2014). “Big data management and
analysis”. International conference on computer engineering and mathematical sciences.
Available at:
https://www.researchgate.net/publication/315670193_Big_Data_Management_and_Analysis
[Accessed 31 Jan. 2019].
Agarwal, S., and Khanam, Z. (2015). Map reduce: A survey paper on recent expansion.
International journal of advanced computer science and applications, 6(8). Available at:
https://pdfs.semanticscholar.org/d97f/b63e53c9da887fc24b1e6c04fa4b942b9696.pdf
[Accessed 31 Jan. 2019].
Aher, S., and Kulkarni, A. (2015). Hadoop map reduce: A programming model for large scale
data processing. 1st ed. Available at: http://www.imedpub.com/articles/hadoop-mapreduce-a-
programming-modelfor-large-scale-data-processing.pdf [Accessed 31 Jan. 2019].
Alam, J., Sajid, A., Talib, R., and Niaz, M. (2014). “A review on the role of big data in
business”. International journal of computer science and mobile computing, 3(4), pp. 446-
453. Available at: https://www.ijcsmc.com/docs/papers/April2014/V3I4201480.pdf
[Accessed 31 Jan. 2019].
126

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Big Data Modelling
Alam, M., and Shakil, K. (2016). Big data analytics in cloud environment using Hadoop. 1st
ed. Available at: https://arxiv.org/ftp/arxiv/papers/1610/1610.04572.pdf [Accessed 31 Jan.
2019].
Beakta, R. (2015). Big Data and Hadoop: A Review paper. 1st ed. Available at:
https://www.researchgate.net/publication/281403776_Big_Data_And_Hadoop_A_Review_P
aper [Accessed 31 Jan. 2019].
Bhosale, H., and Gadekar, D. (2014). A review paper on big data and Hadoop . International
journal of scientific and research publications, 4(10). Available at:
http://www.ijsrp.org/research-paper-1014/ijsrp-p34125.pdf [Accessed 31 Jan. 2019].
Bhosale, H., Gadekar, D. (2014). “ A Review paper on big data and Hadoop”. International
journal of Scientific and research publications, 4 (10). Available at:
http://www.ijsrp.org/research-paper-1014/ijsrp-p34125.pdf [Accessed 31 Jan. 2019].
Biswas, S., and Sen, J. (2016). A proposed architecture for big data driven supply chain
analytics. 1st ed. Available at: https://arxiv.org/ftp/arxiv/papers/1705/1705.04958.pdf
[Accessed 31 Jan. 2019].
Bjorgvinsson, A. (2010). Distributed cluster pruning in Hadoop. 1st ed. Available at:
https://en.ru.is/media/skjol-td/19_Thesis_AndriMarBjorgvinsson.pdf [Accessed 31 Jan.
2019].
Borkar, V., Carey, M., Grover, R., Onose, N., Vernica, R. (2014). Hyracks: A Flexible and
extensible foundation for data intensive computing. 1st ed. Available at:
https://asterix.ics.uci.edu/pub/ICDE11_conf_full_690.pdf [Accessed 31 Jan. 2019].
Brand, W. (2014). Big Data for dummies. 1st ed. Available at:
https://eecs.wsu.edu/~yinghui/mat/courses/fall%202015/resources/Big%20data%20for
%20dummies.pdf [Accessed 31 Jan. 2019].
Cao, Z., Lin, J., Wan, C., Song, Y., Taylor, G., and Li, M. (2014). Hadoop based framework
for big data analysis of synchronised harmonics in active distributed network. 1st ed.
Available at: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8118395 [Accessed 31
Jan. 2019].
127

Big Data Modelling
Chebtko, A., Kashlev, A., and Lu, S. (2017). A big data modelling methodology for Apache
Cassandra. 1st ed. [ebook]. Available at: http://citeseerx.ist.psu.edu/viewdoc/download?
doi=10.1.1.707.3973&rep=rep1&type=pdf [Accessed 31 Jan. 2019].
Chen, H., Chiang, R., and Storey, V. (2016). Business intelligence and analytics: From big
data to big impact. 1st ed. Available at:
https://pdfs.semanticscholar.org/f5fe/b79e04b2e7b61d17a6df79a44faf358e60cd.pdf
[Accessed 31 Jan. 2019].
Choi, T., Chan, H., and Yue, X. (2014). Recent development in big data analytics for
business operations and risks management. 1st ed. Available at:
https://ieeexplore.ieee.org/document/7378465 [Accessed 31 Jan. 2019].
Cognizant. (2014). Big Data is the future of healthcare. 1st ed. Available at:
https://www.cognizant.com/industries-resources/healthcare/Big-Data-is-the-Future-of-
Healthcare.pdf [Accessed 31 Jan. 2019].
Condie, T., Conway, N., Alvaro, P., Hellerstein, J, and Elmeleegy, K. (2015). Map reduce
online. 1st ed. Available at: http://cse.unl.edu/~ylu/csce990/papers/MapReduceOnline.pdf
[Accessed 31 Jan. 2019].
Condie, T., Conway, N., Alvaro, P., Hellerstein, J, and Elmeleegy, K. (2015). An enhanced
DACHE model for map reduce environment . 2nd International conference on big data and
cloud computing, 50(2015) . Available at: https://ac.els-cdn.com/S1877050915005888/1-
s2.0-S1877050915005888-main.pdf?_tid=1bdc7f7e-6a38-4972-9bbd-
eb48189b7545&acdnat=1548224920_ba70f37fa127b2092757b3439575bdb0 [Accessed 31
Jan. 2019].
Dean, J., and Ghemawat, S. (2015). Map reduce: Simplified data processing on the large
clusters. 1st ed. Available at:
https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-
osdi04.pdf [Accessed 31 Jan. 2019].
Deepika, P., and Raman, A. (2016). Hadoop Map reduce- word count implementation.
International journal on database and storage system, 1(3). Available at:
http://www.ijsdr.org/papers/IJSDR1603025.pdf [Accessed 31 Jan. 2019].
128

Big Data Modelling
Demchenko, Y., Laat, C., and Membrey, P. (2014). Defining architecture component of the
big data ecosystem. International conference on collaboration technologies and system.
Available at: Cognizant. (2014). Big Data is the future of healthcare. 1st ed. Available at:
https://www.cognizant.com/industries-resources/healthcare/Big-Data-is-the-Future-of-
Healthcare.pdf [Accessed 31 Jan. 2019]. [Accessed 31 Jan. 2019].
Dittrich, J., and Ruiz, J. (2015). Efficient big data processing in Hadoop map reduce. 1st ed.
Available at: https://bigdata.uni-saarland.de/publications/BigDataTutorial.pdf [Accessed 31
Jan. 2019].
Elgendy, N., Elragal, A. (2014). Big data analytics: A Literature review paper. 1st ed.
[ebook]. Available at:
https://www.researchgate.net/publication/264555968_Big_Data_Analytics_A_Literature_Re
view_Paper [Accessed 31 Jan. 2019].
Elsayed, A., Ismail, O., Sharkawi, E. (2014). Map Reduce: State of the art and research
directions. International journal of computer and electrical engineering, 6 (1). Available at:
http://www.ijcee.org/papers/789-S0036.pdf [Accessed 31 Jan. 2019].
Gopu, M. (2017). “Big Data and Its applications: A Survey”. International journal of
Pharmaceutical, Biological, and Chemical Sciences, 8(2). Available at:
https://www.researchgate.net/publication/316452050_Big_Data_and_Its_Applications_A_Su
rvey [Accessed 31 Jan. 2019].
Hashem, H., and Ranc, D. (2016). An Integrative modelling of big data processing.
International journal of computer science and applications, 12(1), pp.1-15. Available at:
https://pdfs.semanticscholar.org/2103/54f0c23919750ed59a2c7de388f560fdea4e.pdf
[Accessed 31 Jan. 2019].
Hashler, M., Forrest, J., and Bolanos, M. (2014). Introduction to stream: An extensible
framework for data stream clustering research with R. 1st ed. Available at: https://cran.r-
project.org/web/packages/stream/vignettes/stream.pdf [Accessed 31 Jan. 2019].
Kambhampati, R. (2016). Map reduce using the Hadoop framework. 1st ed. Available at:
https://www.researchgate.net/publication/321145913_Map_Reduce_Using_HADOOP_Fram
ework [Accessed 31 Jan. 2019].
129

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Big Data Modelling
Katsipoulakis, N., Tian, Y., Ozcan F., Reinwald, B., and Pirahesh, H. (2015). A general
solution to integrate SQL and analytics for big data. 1st ed. Available at:
https://openproceedings.org/2015/conf/edbt/paper-57.pdf [Accessed 31 Jan. 2019].
Kaur, I., Kaur, N., Ummat, A., Kaur, J., Kaur, N. (2016). “Research paper on big data and
Hadoop”. International journal on computer science and technology, 7 (4). Available at:
http://www.ijcst.com/vol74/1/11-iqbaldeep-kaur.pdf [Accessed 31 Jan. 2019].
Kaur, M. and Dhaliwal, G. (2015). Performance comparison of Map reduce and Apache
spark on Hadoop for big data analysis. International journal of computer science and
engineering, 3 (11). Available at: https://www.ijcseonline.org/pub_paper/14-IJCSE-01395-
2.pdf [Accessed 31 Jan. 2019].
Khan, S., Shakil, K., and Alam, M. (2015). Cloud based big data analytics: a survey of
current research and future directions. 1st ed. [ebook]. Available at:
https://arxiv.org/ftp/arxiv/papers/1508/1508.04733.pdf [Accessed 31 Jan. 2019].
Kim, B., Kang, B., Choi, S., and Kim, T. (2017). Data modelling versus simulation modelling
in the big data era. 1st ed. Available at:
https://journals.sagepub.com/doi/pdf/10.1177/0037549717692866 [Accessed 31 Jan. 2019].
Koman, G., and Kundrikova, J. (2016). “Application of big data technology in Knowledge
transfer process between business and Academia”. 3rd International journal on business,
Economic, Management, and Tourism., 39 (2016), pp. 605-611. Available at: https://ac.els-
cdn.com/S2212567116303057/1-s2.0-S2212567116303057-main.pdf?_tid=b772418a-cadc-
4be2-9215-27489ba78c82&acdnat=1546876814_ca4b628de7c586bd0155845897511a0c
[Accessed 31 Jan. 2019].
Koseleva, N., and Ropaite, G. (2017). Big Data in building energy efficiency: Understanding
of big data and main challenges. 1st ed. Available at:
https://ac.els-cdn.com/S1877705817305702/1-s2.0-S1877705817305702-main.pdf?
_tid=918645e6-f6b5-41bf-be92-
737ccd1e99b3&acdnat=1547835391_6a00928edc2b788636887c189f46786f [Accessed 31
Jan. 2019].
Kumar, P., Kumar, S., Gowdhaman, T., Shajahaan, S. (2016). A survey on IOT performances
in big data. International journal of computer science and mobile computing, 6 (10), pp. 26-
130

Big Data Modelling
34. Available at: https://www.ijcsmc.com/docs/papers/October2017/V6I10201707.pdf
[Accessed 31 Jan. 2019].
Kumar, V., and Chaturvedi, A. (2017). Challenges and security issues in implementation of
Hadoop technology in current digital era. International journal of scientific and engineering
research, 8(4). Available at: https://www.ijser.org/researchpaper/Challenges-and-security-
issues-in-implementation-of-Hadoop-technology-in-Current-Digital-Era.pdf [Accessed 31
Jan. 2019].
Landset, S., Khoshgoftaar, T., Richter, A., and Hasanin, T. (2015). A survey of open source
tools for machine learning with big data in the Hadoop ecosystem. 1st ed. Available at:
https://journalofbigdata.springeropen.com/articles/10.1186/s40537-015-0032-1 [Accessed 31
Jan. 2019].
Lee, K., Choi, H., Moon, B. (2018). Research paper on big data Hadoop map reduce job
scheduling. International journal of innovative research in computer and communication
engineering, 6(1). Available at: http://www.ijircce.com/upload/2018/techfleet/19_Research
%20Paper%20on%20Hadoop-MapReduce%20Job%20Scheduling.pdf [Accessed 31 Jan.
2019].
Li, B. (2013). Survey of recent research progress and issues in big data. 1st ed. Available at:
https://www.cse.wustl.edu/~jain/cse570-13/ftp/bigdata2.pdf [Accessed 31 Jan. 2019].
Li, Z., and Yu, Z. (2015). Conceptual model for successful implementation of big data
organizations. International journal of international technology and information
management, 24(2). Available at:
https://pdfs.semanticscholar.org/27a3/da3c859571f23256e4f5d686f2a880570160.pdf
[Accessed 31 Jan. 2019].
Li, Z., and Yu, Z. (2016). Object based storage model for object oriented database. 1st ed.
Available at: https://link.springer.com/content/pdf/10.1007/978-3-540-74784-0_36.pdf
[Accessed 31 Jan. 2019].
Malik, L., Sangwan, S. (2015). Map reduce framework implementation on the prescriptive
analytics on health industry. International journal of computer science and mobile
computing, 4(6). Available at: https://www.ijcsmc.com/docs/papers/June2015/V4I6KJ25.pdf
[Accessed 31 Jan. 2019].
131

Big Data Modelling
Matthias, Olga, Fouweather, Ian, Gregory,and Vernon. (2014). Making Sense of big data:
Can it transform operation management. 1st ed. Available at:
http://shura.shu.ac.uk/18665/1/Matthias-MakingSenseOfBigData%28AM%29.pdf [Accessed
31 Jan. 2019].
McAfee, A., and Brynjolfsson, E. (2012). Big Data: The Management revolution. 1st ed.
Available at: https://hbr.org/2012/10/big-data-the-management-revolution [Accessed 31 Jan.
2019].
McCreadie, R., Macdonald, C., Ounis, L. (2011). Map reduce indexing strategies: studying
scalability and efficiency. 1st ed. Available at:
http://www.dcs.gla.ac.uk/~richardm/papers/IPM_MapReduce.pdf [Accessed 31 Jan. 2019].
Mo, Z., and Li, Y. (2015). “Research on big data based on the views of technology and
application”. American journal of industrial and business management, 2015 (5), pp. 192-
197. Available at: https://file.scirp.org/pdf/AJIBM_2015042213522819.pdf [Accessed 31
Jan. 2019].
Mo, Z., and Li, Y. (2015). Research of big data based on the views of technology and
application. American Journal of industrial and business management, 2015 (5), pp. 192-197.
Available at:
https://pdfs.semanticscholar.org/73a4/1850dc36939615b2b0b12d384e518680975e.pdf
[Accessed 31 Jan. 2019].
Mukherjee, S., and Shaw, R. (2016). Big data concepts, applications, challenges, and future
scope. International journal of advanced research in computer and communication
engineering, 5 (2). Available at: https://www.ijarcce.com/upload/2016/february-16/IJARCCE
%2015.pdf [Accessed 31 Jan. 2019].
Oussous, A., Benjelloun, F., Lahcen, A., Belfkih, S. (2018). Big Data technologies: A survey.
International journal of King Saud University: Computer and information sciences, 30(4).
Available at: https://www.sciencedirect.com/science/article/pii/S1319157817300034
[Accessed 31 Jan. 2019].
Padhy, R. (2012). Big data processing with Hadoop map reduce in cloud system. 1st ed.
Available at:
132

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Big Data Modelling
https://www.researchgate.net/publication/275405716_Big_Data_Processing_with_Hadoop-
MapReduce_in_Cloud_Systems [Accessed 31 Jan. 2019].
Padhy, R. (2012). Big data processing with Hadoop Map reduce in cloud system. 1st ed.
Available at: https://arxiv.org/ftp/arxiv/papers/1610/1610.04572.pdf [Accessed 31 Jan. 2019].
Patil, P., and Bhosale, A. (2018). “Big data analytics”. Open access journal of science, 2 (5).
Available at: https://medcraveonline.com/OAJS/OAJS-02-00095.pdf [Accessed 31 Jan.
2019].
Pol, U. (2016). Big Data analysis using Hadoop Map reduce. American journal of
engineering research, 5(6). Available at:
http://www.ajer.org/papers/v5(06)/U050601460151.pdf [Accessed 31 Jan. 2019].
Polato, I., Re, R., Goldman, A., and Kon, F. (2014). A Comprehensive view of Hadoop
research- A Systematic literature view. International journal of network and computer
applications, pp. 1-25. Available at:
https://www.researchgate.net/publication/265296194_A_comprehensive_view_of_Hadoop_r
esearch-A_systematic_literature_review [Accessed 31 Jan. 2019].
Prasad, G., Nagesh, H., and Prabhu, S. (2014). An efficient approach to optimize the
performance of massive small files in Hadoop MapReduce Framework. International journal
of computer science and engineering, 5(6). Available at:
https://www.cognizant.com/industries-resources/healthcare/Big-Data-is-the-Future-of-
Healthcare.pdf [Accessed 31 Jan. 2019].
Rahman, A., Sai, K., Rani, G. (2018). “Challenging tools on research issues in big data
analytics”. IJEDR, 6 (1). Available at: https://www.ijedr.org/papers/IJEDR1801110.pdf
[Accessed 31 Jan. 2019].
Reddy, R., and Priya, B. (2018). Survey on data replication based on Hadoop file system.
International journal of pure and applied mathematics, 118(20). Available at:
https://acadpubl.eu/jsi/2018-118-20/articles/20c/53.pdf [Accessed 31 Jan. 2019].
Ribiero A., Silva, A., and Silva, A. (2015). “Data modelling and data analytics: A survey
from big data perspectives”. Journal of software engineering and applications, 2015 (8), pp.
617-634. Available at: https://file.scirp.org/pdf/JSEA_2015123014504942.pdf [Accessed 31
Jan. 2019].
133

Big Data Modelling
Rozados, I., and Tjahjono, B. (2014). Big Data analytics in supply chain management: Trends
and Related research. International conference on operations and supply chain management.
Available at:
https://www.researchgate.net/publication/270506965_Big_Data_Analytics_in_Supply_Chain
_Management_Trends_and_Related_Research [Accessed 31 Jan. 2019].
Saha, B., Shah, H., Seth, S., Vijayaraghavan, G., Murthy, A., Curino, C. (2014). Apache Tez:
A unifying framework for modelling and building data processing applications. 1st ed.
Available at: http://web.eecs.umich.edu/~mosharaf/Readings/Tez.pdf [Accessed 31 Jan.
2019].
Sai, B., and Jyothi, S. (2015). A study on big data modelling techniques. 1st ed. [ebook].
Available at:
https://www.researchgate.net/publication/309194567_A_STUDY_ON_BIG_DATA_MODE
LING_TECHNIQUES [Accessed 31 Jan. 2019].
Senthikumar, S., Rai, B., Meshram, A., Gunasekaran, A. (2018). “Big Data in Healthcare
management: A Review of literature”. American Journal of theoretical and applied business,
4(2). Available at:
http://article.sciencepublishinggroup.com/pdf/10.11648.j.ajtab.20180402.14.pdf [Accessed
31 Jan. 2019].
Sharma, S., Tim, U., Wong, J., Gadia, S., and Sharma, S. (2014). A brief review of leading
big data models. 1st ed. [ebook]. Available at: https://lib.dr.iastate.edu/cgi/viewcontent.cgi?
article=2055&context=abe_eng_pubs [Accessed 31 Jan. 2019].
Silva, T., Magalhaes, R., Brilhante, I., Macedo, J., Araujo, D., Rego, P., and Neto, A. (2018).
“Big Data Analytics technologies and platform: A Brief Review”. International journal of
Latin America Data Science workshop, 3 (4). Available at:
https://pdfs.semanticscholar.org/cdcd/a20d0dc3538b0fcc6999f02481649a829742.pdf
[Accessed 31 Jan. 2019].
Silva, T., Magalhaes, R., Brilhante, I., Macedo, J., Araujo, D., Rego, P., and Neto, A. (2018).
Big Data analytics technologies and platforms: A brief review. 1st ed. Available at:
https://www.researchgate.net/publication/272292507_Applying_data_models_to_big_data_ar
chitectures [Accessed 31 Jan. 2019].
134

Big Data Modelling
Sin, K., and Muthu, L. (2015). “ Application of big data in education data mining and
learning analytics- A literature review”. International journal of soft computing, 5 (4).
Available at: http://ictactjournals.in/paper/IJSC_V5_I4_paper6_1035_1049.pdf [Accessed 31
Jan. 2019].
Sullivan, P., Clifford, A. and Thompson, G., .(2014). Applying data models to big data
architecture. International journal of research and development, 58(5/6). Available at:
https://www.researchgate.net/publication/272292507_Applying_data_models_to_big_data_ar
chitectures [Accessed 31 Jan. 2019].
Sullivan, P., Thompson, G., and Clifford, A. (2016). “Applying data models to big data
architecture”. International journal of research and development, 5(6). Available at:
https://www.researchgate.net/publication/272292507_Applying_data_models_to_big_data_ar
chitectures [Accessed 31 Jan. 2019].
Tan, H., and Chen. L. (2014). An approach for fast and parallel video processing on Apache
Hadoop clusters. 1st ed. Available at:
https://www.researchgate.net/publication/264862451_An_approach_for_fast_and_parallel_vi
deo_processing_on_Apache_Hadoop_clusters [Accessed 31 Jan. 2019].
Thillaieswari, B. (2017). “Comparative study on tools and techniques of big data analysis”.
International journal of advanced networking and applications, 8 (5), pp. 61-66. Available
at: https://www.ijana.in/Special%20Issue/TPID15.pdf [Accessed 31 Jan. 2019].
Tykheev, D. (2016). “Big Data in Marketing”. 1st ed. Available at:
https://www.theseus.fi/bitstream/handle/10024/145613/Big%20Data%20in
%20marketing.pdf?sequence=1&isAllowed=y [Accessed 31 Jan. 2019].
Vijayarani, S., and Sharmila, S. (2016). Research in big data: An overview. International
journal of informatics engineering, 4(3). Available at:
https://pdfs.semanticscholar.org/cdcd/a20d0dc3538b0fcc6999f02481649a829742.pdf
[Accessed 31 Jan. 2019].
Vijyarani, S., and Sharmila, S. (2016). “Research in big data: An Overview”. International
journal on informatics engineerings, 3 (4). Available at:
https://pdfs.semanticscholar.org/cdcd/a20d0dc3538b0fcc6999f02481649a829742.pdf
[Accessed 31 Jan. 2019].
135

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Big Data Modelling
Wang, D., Fan, J., Fu, H., and Zhang, B. (2018). Research on optimization of big data
construction engineering quality management. 1st ed. Available at:
https://www.researchgate.net/publication/272292507_Applying_data_models_to_big_data_ar
chitectures [Accessed 31 Jan. 2019].
Wu, D., Sakr, S., Zhu, L. (2017). Big Data storage and data models. 1st ed. Available at:
https://www.researchgate.net/publication/314119498_Big_Data_Storage_and_Data_Models
[Accessed 31 Jan. 2019].
Zaino, J. (2016). Data modelling in the age of NoSQL and Big data. 1st ed. Available at:
https://www.dataversity.net/data-modeling-age-nosql-big-data/ [Accessed 31 Jan. 2019].
Zheng, Z., Wang, P., Liu, J., and Sun, S. (2015). Real time big data processing framework:
challenges and solutions. An international journal of applied mathematics and information
sciences, 9(6). Available at:
http://www.naturalspublishing.com/files/published/v6910010rnl56m.pdf [Accessed 31 Jan.
2019].
Zou, Q., Li, X., Jiang, W., Lin, Z., Li, G., and Chen, K. (2013). Survey of Map reduce frame
operation in bioinformatics. International journal of briefing in bioinformatics, 15(4).
[Accessed 31 Jan. 2019].
136

1 out of 137

Big Data Modelling

Contribute Materials

Secure Best Marks with AI Grader

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Related Documents

Big Data Privacy Report

IoT Security and Privacy Challenges

Business Process Modelling for Inshore Insurance Ltd

Development of Cloud Computing - PDF

Emerging Technologies and Innovation

A Comprehensive Study on Big Data Security and Integrity over Cloud Storage

+13062052269

info@desklib.com