Data Analysis Research Paper 2022
VerifiedAdded on 2022/09/18
|38
|11003
|26
AI Summary
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
1
Data Brain Box
Student’s Name
Institutional Affiliation
Data Brain Box
Student’s Name
Institutional Affiliation
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
2
Abstract
Data analysis is an important component for e-commerce businesses. In the
contemporary business environment, businesses have to deal with vast amounts of data. To
help aid the process of analysis, visualizing and drawing important conclusions from business
data, this paper focuses of how to develop a data brain box that can be used for that purpose.
While there are several tools available for data analysis, the data brain box will be unique
since it will be used across different platforms and will have the ability to analyse data from
different sources. Big data analytics are very important in the modern e-commerce businesses
as they help collect and analyse data which can aid a business in making important decisions.
The amount of data that a business has to deal with has been growing exponentially in the last
few years and this trend is expected to continue. The methodology will involve seeking the
services and opinions of different experts including software engineers and business leaders.
The software will be made using some of the latest technologies in artificial intelligence such
as neural networks and natural language processing. All applicable local and international
laws will be looked into to ensure compliance. The potential risk of cybercrime will be
alleviated/ minimised through ensuring that no loopholes are left. In addition, the software
will be constantly monitored by a team of cyber security experts.
Abstract
Data analysis is an important component for e-commerce businesses. In the
contemporary business environment, businesses have to deal with vast amounts of data. To
help aid the process of analysis, visualizing and drawing important conclusions from business
data, this paper focuses of how to develop a data brain box that can be used for that purpose.
While there are several tools available for data analysis, the data brain box will be unique
since it will be used across different platforms and will have the ability to analyse data from
different sources. Big data analytics are very important in the modern e-commerce businesses
as they help collect and analyse data which can aid a business in making important decisions.
The amount of data that a business has to deal with has been growing exponentially in the last
few years and this trend is expected to continue. The methodology will involve seeking the
services and opinions of different experts including software engineers and business leaders.
The software will be made using some of the latest technologies in artificial intelligence such
as neural networks and natural language processing. All applicable local and international
laws will be looked into to ensure compliance. The potential risk of cybercrime will be
alleviated/ minimised through ensuring that no loopholes are left. In addition, the software
will be constantly monitored by a team of cyber security experts.
3
Acknowledgement
Acknowledgement
4
Contents
Abstract ........................................................................................................................................................... 2
Acknowledgement ............................................................................................................................................ 3
1.1 Introduction ......................................................................................................................................... 5
1.1.1 Overview of Data Mining .................................................................................................................... 6
1.1.2 Motivation ........................................................................................................................................... 6
1.1.3 Aim and Objectives ............................................................................................................................. 7
1.2 Report Structure................................................................................................................................... 7
2 CHAPTER 2 ............................................................................................................................................ 8
2.1 Literature Review ................................................................................................................................ 8
2.2 Big Data Analytics ............................................................................................................................. 10
2.2.1 Volume.............................................................................................................................................. 13
2.2.2 Velocity ............................................................................................................................................. 14
2.2.3 Value ................................................................................................................................................. 15
2.2.4 Variety .............................................................................................................................................. 16
2.2.5 Veracity ............................................................................................................................................. 17
2.3 Data Mining ...................................................................................................................................... 18
2.4 Data Mining Steps ............................................................................................................................. 19
2.4.1 BUSINESS UNDERSTANDING ...................................................................................................... 20
2.4.2 DATA UNDERSTANDING .............................................................................................................. 20
2.4.3 DATA PREPARATION .................................................................................................................... 20
2.4.4 MODELING.......................................................................................................................................... 21
2.4.5 EVALUATION ................................................................................................................................. 21
2.4.6 DEPLOYMENT ................................................................................................................................ 21
2.5 DATA MINING MODEL AND TECHNIQUES ................................................................................ 22
2.5.1 ASSOCIATION RULES.................................................................................................................... 22
2.5.2 CLASSIFICATION ........................................................................................................................... 22
2.5.3 ARITIFICIAL NEURAL NETWORK (ANN).................................................................................... 23
2.5.4 DECISION TREE .............................................................................................................................. 24
2.5.5 RANDOM FOREST .......................................................................................................................... 24
2.6 RELATED WORK ............................................................................................................................ 25
2.6.1 Market Basket Analysis ..................................................................................................................... 25
2.6.1.1 Sales Forecasting ........................................................................................................................... 25
2.6.1.2 Customer Profiling ........................................................................................................................ 26
2.6.1.3 Click Stream Data .......................................................................................................................... 26
2.6.1.4: Use of Data Brain Boxes in the Past ................................................................................................... 27
3 CHAPTER 3 .......................................................................................................................................... 28
3.1 Evaluation Methodology .................................................................................................................... 28
3.2 Development Methodology ................................................................................................................ 29
4 CHAPTER 4 .......................................................................................................................................... 29
4.1 Requirements Analysis ...................................................................................................................... 29
4.1.1 Functional Requirements.................................................................................................................... 30
4.1.2 Data Collection – ............................................................................................................................... 30
4.1.3 Data Storage – ................................................................................................................................... 31
4.1.4 Data Analytics – ................................................................................................................................ 31
4.1.5 Predictive Analysis – ......................................................................................................................... 31
4.2 Non-Functional Requirements ............................................................................................................ 31
5 CHAPTER 5 .......................................................................................................................................... 32
5.1 PROFESSIONAL, LEGAL AND ETHICAL ISSUES ........................................................................ 32
5.1.1 Professional Issues and Legal Issues ................................................................................................... 32
5.1.2 Ethical Issues ..................................................................................................................................... 32
6 CHAPTER 6 .......................................................................................................................................... 33
6.1 PROJECT PLAN ...................................................................................................................................... 33
6.2 METHODOLOGY ................................................................................................................................... 34
6.3 RISK MANAGEMENT ............................................................................................................................ 34
Bibliography .................................................................................................................................................. 35
Contents
Abstract ........................................................................................................................................................... 2
Acknowledgement ............................................................................................................................................ 3
1.1 Introduction ......................................................................................................................................... 5
1.1.1 Overview of Data Mining .................................................................................................................... 6
1.1.2 Motivation ........................................................................................................................................... 6
1.1.3 Aim and Objectives ............................................................................................................................. 7
1.2 Report Structure................................................................................................................................... 7
2 CHAPTER 2 ............................................................................................................................................ 8
2.1 Literature Review ................................................................................................................................ 8
2.2 Big Data Analytics ............................................................................................................................. 10
2.2.1 Volume.............................................................................................................................................. 13
2.2.2 Velocity ............................................................................................................................................. 14
2.2.3 Value ................................................................................................................................................. 15
2.2.4 Variety .............................................................................................................................................. 16
2.2.5 Veracity ............................................................................................................................................. 17
2.3 Data Mining ...................................................................................................................................... 18
2.4 Data Mining Steps ............................................................................................................................. 19
2.4.1 BUSINESS UNDERSTANDING ...................................................................................................... 20
2.4.2 DATA UNDERSTANDING .............................................................................................................. 20
2.4.3 DATA PREPARATION .................................................................................................................... 20
2.4.4 MODELING.......................................................................................................................................... 21
2.4.5 EVALUATION ................................................................................................................................. 21
2.4.6 DEPLOYMENT ................................................................................................................................ 21
2.5 DATA MINING MODEL AND TECHNIQUES ................................................................................ 22
2.5.1 ASSOCIATION RULES.................................................................................................................... 22
2.5.2 CLASSIFICATION ........................................................................................................................... 22
2.5.3 ARITIFICIAL NEURAL NETWORK (ANN).................................................................................... 23
2.5.4 DECISION TREE .............................................................................................................................. 24
2.5.5 RANDOM FOREST .......................................................................................................................... 24
2.6 RELATED WORK ............................................................................................................................ 25
2.6.1 Market Basket Analysis ..................................................................................................................... 25
2.6.1.1 Sales Forecasting ........................................................................................................................... 25
2.6.1.2 Customer Profiling ........................................................................................................................ 26
2.6.1.3 Click Stream Data .......................................................................................................................... 26
2.6.1.4: Use of Data Brain Boxes in the Past ................................................................................................... 27
3 CHAPTER 3 .......................................................................................................................................... 28
3.1 Evaluation Methodology .................................................................................................................... 28
3.2 Development Methodology ................................................................................................................ 29
4 CHAPTER 4 .......................................................................................................................................... 29
4.1 Requirements Analysis ...................................................................................................................... 29
4.1.1 Functional Requirements.................................................................................................................... 30
4.1.2 Data Collection – ............................................................................................................................... 30
4.1.3 Data Storage – ................................................................................................................................... 31
4.1.4 Data Analytics – ................................................................................................................................ 31
4.1.5 Predictive Analysis – ......................................................................................................................... 31
4.2 Non-Functional Requirements ............................................................................................................ 31
5 CHAPTER 5 .......................................................................................................................................... 32
5.1 PROFESSIONAL, LEGAL AND ETHICAL ISSUES ........................................................................ 32
5.1.1 Professional Issues and Legal Issues ................................................................................................... 32
5.1.2 Ethical Issues ..................................................................................................................................... 32
6 CHAPTER 6 .......................................................................................................................................... 33
6.1 PROJECT PLAN ...................................................................................................................................... 33
6.2 METHODOLOGY ................................................................................................................................... 34
6.3 RISK MANAGEMENT ............................................................................................................................ 34
Bibliography .................................................................................................................................................. 35
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
5
1.1 Introduction
Businesses are happy when they have more data about their businesses, the wants of
their customers and mostly importantly results of strategy implementation. However, when
they have this data, they may not know exactly what do with it. The inability of businesses
not knowing how to utilize data can lead to loss of revenue opportunities, lower productivity,
effectiveness and quality issues.
This thesis discusses the under-utilization of data in businesses such as E-commerce
and how this under-utilized data can be processed into something useful.
Ecommerce businesses obtains a lot of information about their customers, data is
obtained whenever purchases are made or whenever products are viewed on a website. Over
the past years, there has been an increase in the need of data in the E-commerce industry.
This is due to the fact that E-commerce companies that are data driven experiences a higher
level of productivity than their competitors (McAfee & Brynjolfsson, 2012).
A recent study carried out by BSA Software Alliance shows that Data analysis
contributes to 15% or more of the growth for 56% of firms. Therefore 91% of fortune 1000
companies are investing in data analysis projects, an 85% increase from the previous years
(Akter & Wamba, 2016). While at the same time, the use of internet-based technologies
provides e-commerce companies with transformative benefits such as real-time customer
service, pricing options or personalized offers. However, Data mining helps solidify these
benefits by providing informed decisions based on critical insights and allows the companies
use data more efficiently to drive a higher conversion rate by customers.
It is very important Ecommerce businesses to have smart way of getting business
insights for what consumers want to see when their site is visited in order to get the best out
of their business. The objective is to develop a data brain box that provides data collection,
data transformation, data storage and visualization.
1.1 Introduction
Businesses are happy when they have more data about their businesses, the wants of
their customers and mostly importantly results of strategy implementation. However, when
they have this data, they may not know exactly what do with it. The inability of businesses
not knowing how to utilize data can lead to loss of revenue opportunities, lower productivity,
effectiveness and quality issues.
This thesis discusses the under-utilization of data in businesses such as E-commerce
and how this under-utilized data can be processed into something useful.
Ecommerce businesses obtains a lot of information about their customers, data is
obtained whenever purchases are made or whenever products are viewed on a website. Over
the past years, there has been an increase in the need of data in the E-commerce industry.
This is due to the fact that E-commerce companies that are data driven experiences a higher
level of productivity than their competitors (McAfee & Brynjolfsson, 2012).
A recent study carried out by BSA Software Alliance shows that Data analysis
contributes to 15% or more of the growth for 56% of firms. Therefore 91% of fortune 1000
companies are investing in data analysis projects, an 85% increase from the previous years
(Akter & Wamba, 2016). While at the same time, the use of internet-based technologies
provides e-commerce companies with transformative benefits such as real-time customer
service, pricing options or personalized offers. However, Data mining helps solidify these
benefits by providing informed decisions based on critical insights and allows the companies
use data more efficiently to drive a higher conversion rate by customers.
It is very important Ecommerce businesses to have smart way of getting business
insights for what consumers want to see when their site is visited in order to get the best out
of their business. The objective is to develop a data brain box that provides data collection,
data transformation, data storage and visualization.
6
1.1.1 Overview of Data Mining
In the 1973 Webster’s New Collegiate Dictionary data is defined as “factual
information used as a basis for reasoning, discussion, or calculation.” The 1996 Version of
the Webster Dictionary defined data as “information, especially information organized for
analysis (Migrant & Seasonal Head Start Technical Assistant Center, n.d.).
From the definitions above, a more practical way of defining data is that data is a
collection of numbers, characters, images or other method of recording, in a form which can
be assessed to make a decision about a specific action. By closely analysing data we can find
patterns to perceive information which can be used to enhance knowledge (Migrant &
Seasonal Head Start Technical Assistant Center, n.d.).
Data mining is therefore a form of business intelligence and data analysis. It is the
process of digging into larger, unstructured data to get useful correlations or predictions from
it (Han & Kamber, 2011).
1.1.2 Motivation
Being a product designer and having worked with several startup businesses in
Ecommerce industry. It has been realized from my experience over the years that most
Ecommerce companies have no idea of what to do after they have their website or
applications developed asides the upload of products and selling to the few consumers, they
have access to. Some don’t even know the true value of the data the get from there sales. So,
the motivation for this thesis is to bridge the gap of the under-utilization.
1.1.1 Overview of Data Mining
In the 1973 Webster’s New Collegiate Dictionary data is defined as “factual
information used as a basis for reasoning, discussion, or calculation.” The 1996 Version of
the Webster Dictionary defined data as “information, especially information organized for
analysis (Migrant & Seasonal Head Start Technical Assistant Center, n.d.).
From the definitions above, a more practical way of defining data is that data is a
collection of numbers, characters, images or other method of recording, in a form which can
be assessed to make a decision about a specific action. By closely analysing data we can find
patterns to perceive information which can be used to enhance knowledge (Migrant &
Seasonal Head Start Technical Assistant Center, n.d.).
Data mining is therefore a form of business intelligence and data analysis. It is the
process of digging into larger, unstructured data to get useful correlations or predictions from
it (Han & Kamber, 2011).
1.1.2 Motivation
Being a product designer and having worked with several startup businesses in
Ecommerce industry. It has been realized from my experience over the years that most
Ecommerce companies have no idea of what to do after they have their website or
applications developed asides the upload of products and selling to the few consumers, they
have access to. Some don’t even know the true value of the data the get from there sales. So,
the motivation for this thesis is to bridge the gap of the under-utilization.
7
1.1.3 Aim and Objectives
The aim of this dissertation is to investigate some effective ways in which businesses
can utilize available data to increase sales and return on investment. Core to this investigation
will be data mining techniques and various algorithms that could help achieve the task
mentioned above. These algorithms include but are not limited to neural networks, decision
trees and machine learning. This paper also aims to develop a prototype for a data drain box
that could help e-commerce businesses collect relevant data and utilize it to the advantage of
the business.
1.2 Report Structure
My report will be outlined as follows.
Chapter 2- This chapter comprises of the literature review, which gives a summary of
various algorithms and technologies on data mining, data warehousing, data visualization and
predictive analysis
Chapter 3- This chapter identifies the requirements analysis of the project
Chapter 4- This chapter project implementation and evaluation
Chapter 5- This chapter describes the professional, legal, ethical and social issues that
can be associated with the project
Chapter 6- This chapter provides the project plan of the project
1.1.3 Aim and Objectives
The aim of this dissertation is to investigate some effective ways in which businesses
can utilize available data to increase sales and return on investment. Core to this investigation
will be data mining techniques and various algorithms that could help achieve the task
mentioned above. These algorithms include but are not limited to neural networks, decision
trees and machine learning. This paper also aims to develop a prototype for a data drain box
that could help e-commerce businesses collect relevant data and utilize it to the advantage of
the business.
1.2 Report Structure
My report will be outlined as follows.
Chapter 2- This chapter comprises of the literature review, which gives a summary of
various algorithms and technologies on data mining, data warehousing, data visualization and
predictive analysis
Chapter 3- This chapter identifies the requirements analysis of the project
Chapter 4- This chapter project implementation and evaluation
Chapter 5- This chapter describes the professional, legal, ethical and social issues that
can be associated with the project
Chapter 6- This chapter provides the project plan of the project
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
8
2 CHAPTER 2
2.1 Literature Review
This chapter provides a literature on data mining and predictive analysis for business
marketing data. We will introduce some of the core techniques, concepts and solutions for
data mining in order to meet the aims and objectives of this project. In the contemporary
society, technology has been integrated into almost all facets of our lives. Businesses have not
been left behind. Businesses form a significant number of organizations that exist in the
modern day (Lowndes et al, 2017). With technology being advanced now more than have
ever been observed in history, it is very important for businesses to take advantage of these
technologies to increase their sales and consequently maximize on their profits (Gupta &
George, 2016). Let us take a simple example. Consider the number of people who use
smartphones. There are several billion such people in the world. These people are most likely
to search for the product or services that they need online. Businesses can tap into such an
opportunity for their own advantage. It is worth noting that many businesses have not
invested in data mining and data analysis (Tan, Steinbach & Kumar, 2016). If businesses
could tap into the field of data collection, data analysis and use the data to make important
predictions, the chances of business success are increased. Considering the highly
competitive nature of businesses in the modern world, it is only wise for businesses to
consider venturing into data collection and analysis. It is for this reason that this paper aims to
investigate ways in which e-commerce businesses can tap in to the huge amounts of data that
exist, make sense of this data and use it to make important predictions and decisions
concerning their businesses. There exists extensive evidence to show that businesses with
effective social media marketing are more likely to succeed compared to their counterparts
who have not invested in this kind of marketing (Jackson, 2019). It would be important for e-
commerce businesses to consider having a heavy social media presence (Dai, Wong, Wang,
Zheng & Vasilakos, 2019). In fact, it would be appropriate for them to consider hiring a
2 CHAPTER 2
2.1 Literature Review
This chapter provides a literature on data mining and predictive analysis for business
marketing data. We will introduce some of the core techniques, concepts and solutions for
data mining in order to meet the aims and objectives of this project. In the contemporary
society, technology has been integrated into almost all facets of our lives. Businesses have not
been left behind. Businesses form a significant number of organizations that exist in the
modern day (Lowndes et al, 2017). With technology being advanced now more than have
ever been observed in history, it is very important for businesses to take advantage of these
technologies to increase their sales and consequently maximize on their profits (Gupta &
George, 2016). Let us take a simple example. Consider the number of people who use
smartphones. There are several billion such people in the world. These people are most likely
to search for the product or services that they need online. Businesses can tap into such an
opportunity for their own advantage. It is worth noting that many businesses have not
invested in data mining and data analysis (Tan, Steinbach & Kumar, 2016). If businesses
could tap into the field of data collection, data analysis and use the data to make important
predictions, the chances of business success are increased. Considering the highly
competitive nature of businesses in the modern world, it is only wise for businesses to
consider venturing into data collection and analysis. It is for this reason that this paper aims to
investigate ways in which e-commerce businesses can tap in to the huge amounts of data that
exist, make sense of this data and use it to make important predictions and decisions
concerning their businesses. There exists extensive evidence to show that businesses with
effective social media marketing are more likely to succeed compared to their counterparts
who have not invested in this kind of marketing (Jackson, 2019). It would be important for e-
commerce businesses to consider having a heavy social media presence (Dai, Wong, Wang,
Zheng & Vasilakos, 2019). In fact, it would be appropriate for them to consider hiring a
9
social media marketing team. This team should focus on the integration of social media sites
into the e-commerce websites. The team should also be tasked with the responsibility of
uploading appropriate information and responding to any queries or issues that potential
clients may have (Eldén, 2019)
The main goal of such a team would be to ensure that it uses social media to convert
potential customers into buying customers. In addition to carrying out the tasks described
above, the team should also carry out data analytics on depending on factors such as traffic,
age and location of potential customers. Here is an example of how these data analytics may
work. Suppose the team discovers that most of the people who are buying from the business
are of a certain age group. Based on that data, the business may dedicate more resources
targeting that particular age group. Such a move is likely to result into more sales for the
business since the most appropriate group is targeted.
Email marketing is another tool that can be integrated into e-commerce websites and
help collect appropriate data about the customers (Steels & Brooks, 2018). With a tool that
manages mails, the business can be able to send promotional messages to appropriate
customers. E-mail marketing may provide very unique kind of data to the business and may
help increase sales (Hassabis, Kumaran, Summerfield, & Botvinick, 2017). Let us take some
very specific example concerning email marketing. Suppose a customer visits an e-commerce
site, places and order for an item and subscribes to the mailing list. The email can be used to
update the customer on the status of their order right from when they purchase the item to
when the item is shipped. If the same customer becomes a regular customer, e-mail marketing
data can be used to notice this. The business may use such information to offer incentives to
the loyal customer. For instance, an email may be sent to the customer offering them a 10
percent reduction in price the next time they buy and item with the business.
In a nutshell, there are numerous ways in which businesses can collect appropriate
data, analyse that data, visualize it and use it to make important business decisions. Therefore,
it is absolutely important for business to tap into the tools that exist for making such
social media marketing team. This team should focus on the integration of social media sites
into the e-commerce websites. The team should also be tasked with the responsibility of
uploading appropriate information and responding to any queries or issues that potential
clients may have (Eldén, 2019)
The main goal of such a team would be to ensure that it uses social media to convert
potential customers into buying customers. In addition to carrying out the tasks described
above, the team should also carry out data analytics on depending on factors such as traffic,
age and location of potential customers. Here is an example of how these data analytics may
work. Suppose the team discovers that most of the people who are buying from the business
are of a certain age group. Based on that data, the business may dedicate more resources
targeting that particular age group. Such a move is likely to result into more sales for the
business since the most appropriate group is targeted.
Email marketing is another tool that can be integrated into e-commerce websites and
help collect appropriate data about the customers (Steels & Brooks, 2018). With a tool that
manages mails, the business can be able to send promotional messages to appropriate
customers. E-mail marketing may provide very unique kind of data to the business and may
help increase sales (Hassabis, Kumaran, Summerfield, & Botvinick, 2017). Let us take some
very specific example concerning email marketing. Suppose a customer visits an e-commerce
site, places and order for an item and subscribes to the mailing list. The email can be used to
update the customer on the status of their order right from when they purchase the item to
when the item is shipped. If the same customer becomes a regular customer, e-mail marketing
data can be used to notice this. The business may use such information to offer incentives to
the loyal customer. For instance, an email may be sent to the customer offering them a 10
percent reduction in price the next time they buy and item with the business.
In a nutshell, there are numerous ways in which businesses can collect appropriate
data, analyse that data, visualize it and use it to make important business decisions. Therefore,
it is absolutely important for business to tap into the tools that exist for making such
10
important moves. The information provided in this section show that there is a great need for
business to have appropriate tools to help them manage available data in a way that helps
achieve business goals. It follows that the idea of data brain box is particularly made to help
businesses achieve this, is not only a great one but one that is vital in the modern economy.
We are in the age of artificial intelligence. Therefore, important tools such as machine
learning and neural networks can help create data mining software that are faster and more
effective. The following section will look into more literature. Literature is important as it
helps look into what has already been done, the loopholes that may exist and what could be
done better (Blum, Hopcroft & Kannan, 2020).
Data brain boxes were not a common phenomenon with the 4th generation of
computers and the previous versions (Alpaydin, 2020). They have become more common
with the 5th generation, that is the knowledge-based system. It is estimated that artificial
intelligence which will be an integral part of the fourth industrial revolution will see an
exponential rise in data brain boxes and related technology. Knowledge based systems which
are essential in making effective data brain boxes are going to power the fourth industrial
revolution (Mohri, Rostamizadeh & Talwalkar, 018). As seen in the discussion above,
businesses are already making use of this important technology to make important business
decisions.
2.2 Big Data Analytics
The term “Big Data” refers to large datasets. These are data sets so large to work with
using the traditional database management systems. The datasets are usually very large which
makes it difficult for commonly used software tools and storage devices to capture, manage
and store data. Because of the complexity and volume of these data it takes a longer time for
analysis (kubick, 2012).
important moves. The information provided in this section show that there is a great need for
business to have appropriate tools to help them manage available data in a way that helps
achieve business goals. It follows that the idea of data brain box is particularly made to help
businesses achieve this, is not only a great one but one that is vital in the modern economy.
We are in the age of artificial intelligence. Therefore, important tools such as machine
learning and neural networks can help create data mining software that are faster and more
effective. The following section will look into more literature. Literature is important as it
helps look into what has already been done, the loopholes that may exist and what could be
done better (Blum, Hopcroft & Kannan, 2020).
Data brain boxes were not a common phenomenon with the 4th generation of
computers and the previous versions (Alpaydin, 2020). They have become more common
with the 5th generation, that is the knowledge-based system. It is estimated that artificial
intelligence which will be an integral part of the fourth industrial revolution will see an
exponential rise in data brain boxes and related technology. Knowledge based systems which
are essential in making effective data brain boxes are going to power the fourth industrial
revolution (Mohri, Rostamizadeh & Talwalkar, 018). As seen in the discussion above,
businesses are already making use of this important technology to make important business
decisions.
2.2 Big Data Analytics
The term “Big Data” refers to large datasets. These are data sets so large to work with
using the traditional database management systems. The datasets are usually very large which
makes it difficult for commonly used software tools and storage devices to capture, manage
and store data. Because of the complexity and volume of these data it takes a longer time for
analysis (kubick, 2012).
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
11
Everyday there is an exponential increase in the amount of data collected by
businesses ranging from dozens of terabytes (TB) to many petabytes (PB) of data in a dataset.
Currently some of the problems with this volume of data includes, capturing, storing,
searching, sharing, analytics and visualizing. Today businesses are exploring volumes of data
so as to discover knowledge to grow their businesses (Russom, 2011).
From the many problems created by Big Data, one of the biggest problems is the
spread of volumes of data across different application in a business organization. These
spread of information is not very useful but if the data is merged and processed a new dataset
can be created which will bring value to the business. In order to get value from this
tremendous amount of data, it is characterized using the five V’s of big data.
When managed well, big data analytics may open immense opportunities for an e-
commerce business. There exist modern techniques of helping make important data analysis
on relatively large data sets. With these technologies, it is possible to analyse the vast amount
of data and draw important conclusions that can help business organizations make well
informed decisions. Big data analytics may be viewed as form of quite advanced analysis
involving complex applications statistical algorithms and elements such as predictive models
(Roiger, 2017). These analytics also perform what-if analysis and are powered by what is
referred to as high performance analytics systems.
There are numerous advantages that can be accrued by an e-commerce business from
using big data. Some of these advantages will be explained here. As already observed these
analytics are used to handle large data sets whose utilization can help a business utilize new
opportunities. The analytic may uncover phenomenal opportunities that were never thought to
exist before (Marjani, Nasaruddin, Gani, Karim, Hashem, Siddiqa & Yaqoob, 2017). It
follows that if business want increased profits, improved operations and happy customers, big
data analytics is the way to go. Businesses need these analytics because the amount of
available data is very larger and continues to grow exponentially each day (Lu, Li, Chen, Kim
& Serikawa, 2018). It is important for business to have an idea of the kind of data that is
Everyday there is an exponential increase in the amount of data collected by
businesses ranging from dozens of terabytes (TB) to many petabytes (PB) of data in a dataset.
Currently some of the problems with this volume of data includes, capturing, storing,
searching, sharing, analytics and visualizing. Today businesses are exploring volumes of data
so as to discover knowledge to grow their businesses (Russom, 2011).
From the many problems created by Big Data, one of the biggest problems is the
spread of volumes of data across different application in a business organization. These
spread of information is not very useful but if the data is merged and processed a new dataset
can be created which will bring value to the business. In order to get value from this
tremendous amount of data, it is characterized using the five V’s of big data.
When managed well, big data analytics may open immense opportunities for an e-
commerce business. There exist modern techniques of helping make important data analysis
on relatively large data sets. With these technologies, it is possible to analyse the vast amount
of data and draw important conclusions that can help business organizations make well
informed decisions. Big data analytics may be viewed as form of quite advanced analysis
involving complex applications statistical algorithms and elements such as predictive models
(Roiger, 2017). These analytics also perform what-if analysis and are powered by what is
referred to as high performance analytics systems.
There are numerous advantages that can be accrued by an e-commerce business from
using big data. Some of these advantages will be explained here. As already observed these
analytics are used to handle large data sets whose utilization can help a business utilize new
opportunities. The analytic may uncover phenomenal opportunities that were never thought to
exist before (Marjani, Nasaruddin, Gani, Karim, Hashem, Siddiqa & Yaqoob, 2017). It
follows that if business want increased profits, improved operations and happy customers, big
data analytics is the way to go. Businesses need these analytics because the amount of
available data is very larger and continues to grow exponentially each day (Lu, Li, Chen, Kim
& Serikawa, 2018). It is important for business to have an idea of the kind of data that is
12
generated through it. If this information is not analysed, it gets wasted denying the business
some highly valuable data (Pappas, Mikalef, Giannakos, Krogstie & Lekakos, 2018). To
make matters better for businesses, some great tools exist to help analyse this data (Leskovec,
Rajaraman & Ullman, 2020). Even where these tools may not exist, they can always be
developed. In the past, businesses had to hire a whole team if they wanted to carry out some
analytics. In the modern days however, modern software carries out these tasks in a highly
reliable manner. The modern software is also fast.
Big data analytics may help an e-commerce business gain a deeper understanding of
the market. With the high-speed memory in these analytics and with the ability to analyse
data in real time, important information about market can be availed to the business almost
instantly. The market is an important component of any business. Therefore, having a tool
that helps provide appropriate information about the market is a great win for e-commerce
businesses. With appropriate market information, these businesses are able to deliver
products more efficiently. In addition, it becomes possible to manage deadlines with a lot of
ease.
Big data analytics can help the business gain a good understanding about the industry.
Since these analytics have the ability to comprehend industry knowledge, they can provide
information to help a business make important decisions about the future (Acemoglu &
Restrepo, 2018). In addition, the analytics can provide information on the kind of economy
available. Information on the kind of economy can help a business in its expansion plans.
Such expansion not only helps the business to row but also to build a very strong brand.
Although the economy is constantly changing and there is need for business to continuously
adapt to various environments, the main goal for any business remains to be profit
maximization. Big data analytics helps provide refined information from data sets which
helps a business focus on the areas that maximize profits.
In the light of the observations made above, there is no doubt that big data analytics
are very essential to an e-commerce business. It would be true to conclude that big data
generated through it. If this information is not analysed, it gets wasted denying the business
some highly valuable data (Pappas, Mikalef, Giannakos, Krogstie & Lekakos, 2018). To
make matters better for businesses, some great tools exist to help analyse this data (Leskovec,
Rajaraman & Ullman, 2020). Even where these tools may not exist, they can always be
developed. In the past, businesses had to hire a whole team if they wanted to carry out some
analytics. In the modern days however, modern software carries out these tasks in a highly
reliable manner. The modern software is also fast.
Big data analytics may help an e-commerce business gain a deeper understanding of
the market. With the high-speed memory in these analytics and with the ability to analyse
data in real time, important information about market can be availed to the business almost
instantly. The market is an important component of any business. Therefore, having a tool
that helps provide appropriate information about the market is a great win for e-commerce
businesses. With appropriate market information, these businesses are able to deliver
products more efficiently. In addition, it becomes possible to manage deadlines with a lot of
ease.
Big data analytics can help the business gain a good understanding about the industry.
Since these analytics have the ability to comprehend industry knowledge, they can provide
information to help a business make important decisions about the future (Acemoglu &
Restrepo, 2018). In addition, the analytics can provide information on the kind of economy
available. Information on the kind of economy can help a business in its expansion plans.
Such expansion not only helps the business to row but also to build a very strong brand.
Although the economy is constantly changing and there is need for business to continuously
adapt to various environments, the main goal for any business remains to be profit
maximization. Big data analytics helps provide refined information from data sets which
helps a business focus on the areas that maximize profits.
In the light of the observations made above, there is no doubt that big data analytics
are very essential to an e-commerce business. It would be true to conclude that big data
13
analytics remains to be one of the most important tools that businesses can use to remain
relevant in a world that is constantly changing. In almost all facets of life, data analytics have
changed the way people used to behave. Changes in the lives of people translate to changes in
way of doing business. Otherwise, if a business is resistant to change it risks being phased
out. Big data analytics can be described as the new age of data and it comes with unlimited
potential for businesses. Going into the future, it will be very important for businesses to
make constant use of the available tools to maximize on profits and make important business
decisions. Some of the most successful businesses invest heavily in use of big data analytics.
Such businesses include Amazon and Google. The following section will dive into the five
Vs of big data analytics to help provide some important insights into what it is and why it is
important.
2.2.1 Volume
The volume of data represents the most immediate challenge to the conventional
business. Many businesses already have large amounts of saved data, but do not have a means
to process these data into something meaningful. (Russom, 2011) illustrated that big data
analysis take large volumes of data usually expressed in petabytes and exabytes which are
used for making strategic marketing decisions. Data generated are often unstructured which
usually includes videos, images or data from mobile technologies. It is unlikely that big data
will be clean and free from errors. As this may pose some challenges in data preparations big
data enables real time decision making for ecommerce firms (Kang, et al., 2003). For
example, Amazon developed a sophisticated recommendation engine that delivers over 35%
of sales, automated customer service systems that ensures superior customer satisfaction and
dynamic pricing systems that adjust pricing against competing sites every 15 seconds (Goff,
et al., 2012).
analytics remains to be one of the most important tools that businesses can use to remain
relevant in a world that is constantly changing. In almost all facets of life, data analytics have
changed the way people used to behave. Changes in the lives of people translate to changes in
way of doing business. Otherwise, if a business is resistant to change it risks being phased
out. Big data analytics can be described as the new age of data and it comes with unlimited
potential for businesses. Going into the future, it will be very important for businesses to
make constant use of the available tools to maximize on profits and make important business
decisions. Some of the most successful businesses invest heavily in use of big data analytics.
Such businesses include Amazon and Google. The following section will dive into the five
Vs of big data analytics to help provide some important insights into what it is and why it is
important.
2.2.1 Volume
The volume of data represents the most immediate challenge to the conventional
business. Many businesses already have large amounts of saved data, but do not have a means
to process these data into something meaningful. (Russom, 2011) illustrated that big data
analysis take large volumes of data usually expressed in petabytes and exabytes which are
used for making strategic marketing decisions. Data generated are often unstructured which
usually includes videos, images or data from mobile technologies. It is unlikely that big data
will be clean and free from errors. As this may pose some challenges in data preparations big
data enables real time decision making for ecommerce firms (Kang, et al., 2003). For
example, Amazon developed a sophisticated recommendation engine that delivers over 35%
of sales, automated customer service systems that ensures superior customer satisfaction and
dynamic pricing systems that adjust pricing against competing sites every 15 seconds (Goff,
et al., 2012).
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
14
There exists extensive evidence to show that the amount of data that companies
manage has been increasing exponentially. As Ahmed et al (2017) observes, it is estimated
that since the year 2012, the amount of data that businesses have to deal with has been
doubling every 40months. Consequently, it is important for businesses to have some effective
way of analysing this immense volume of data. As noted above, this data is not without errors
or mistakes. However, big data analytics tools offer ways of pre-processing data and ensuring
that most of these errors are handled effectively (Aydiner, Tatoglu, Bayraktar, Zaim & Delen,
2019). The good thing with these analytics is that they can be able to handle immense
volumes of data in a matter of nanoseconds (Tegmark, 2017). In the modern days, cloud
storage has come to the aid of storing such data (Donoho, 2017). It could be quite
overwhelming and costly for businesses to store vast amounts of data using traditional means
such as hard disk drives or even the solid-state drives (Hofmann & Klinkenberg, 2016). It
would not be effective or even sustainable. It follows that businesses need tools to analyse
large volumes of data in a way that is efficient and cost effective. The best solution in as far
as that is concerned is the big data analytical tools.
2.2.2 Velocity
Velocity is the increasing speed at which data is created, that is, the speed at which
new data is generated, processed, stored and analysed. The velocity of big data needs to
periodized and synced into business processes, decision making and improvements in
performance (Beuike, 2011).
Capitalizing on this high rate of data processing, many e-commerce firms different
techniques in adding value to their businesses. For example, eBay has performed several
experiments using data velocity with different aspects of it website, which resulted in better
There exists extensive evidence to show that the amount of data that companies
manage has been increasing exponentially. As Ahmed et al (2017) observes, it is estimated
that since the year 2012, the amount of data that businesses have to deal with has been
doubling every 40months. Consequently, it is important for businesses to have some effective
way of analysing this immense volume of data. As noted above, this data is not without errors
or mistakes. However, big data analytics tools offer ways of pre-processing data and ensuring
that most of these errors are handled effectively (Aydiner, Tatoglu, Bayraktar, Zaim & Delen,
2019). The good thing with these analytics is that they can be able to handle immense
volumes of data in a matter of nanoseconds (Tegmark, 2017). In the modern days, cloud
storage has come to the aid of storing such data (Donoho, 2017). It could be quite
overwhelming and costly for businesses to store vast amounts of data using traditional means
such as hard disk drives or even the solid-state drives (Hofmann & Klinkenberg, 2016). It
would not be effective or even sustainable. It follows that businesses need tools to analyse
large volumes of data in a way that is efficient and cost effective. The best solution in as far
as that is concerned is the big data analytical tools.
2.2.2 Velocity
Velocity is the increasing speed at which data is created, that is, the speed at which
new data is generated, processed, stored and analysed. The velocity of big data needs to
periodized and synced into business processes, decision making and improvements in
performance (Beuike, 2011).
Capitalizing on this high rate of data processing, many e-commerce firms different
techniques in adding value to their businesses. For example, eBay has performed several
experiments using data velocity with different aspects of it website, which resulted in better
15
layout and website features. Ecommerce businesses now use high end systems to collect.
Store and analyse the data to make real time decisions.
As Wamba, Gunasekaran, Akter, Ren, Dubey & Childeb (2017) argues, it is important
to note that velocity of information may be even more important than the volume. As Bichler,
Heinzl & van (2017) observes, when it comes to velocity, the main aim of an e-commerce
business should be to ensure that information is received and analysed quickly (as close to
real time as possible). Most times, it may be important to have less data being analysed at a
relatively high rate rather than have huge amounts of data that are being analysed at slow
speeds. Extensive evidence reveals that velocity is more important than volume since it gives
businesses a higher edge when it comes to competitive advantage (Laursen & Thorlund,
2016). It follows that an e-commerce business should maximize on ensuring that large
amounts of data can be handled in a relatively faster manner. However, where the business
has to choose between volume and velocity, it may be important to analyse small amounts of
data at a higher velocity rather than have large amounts of data which slows down velocity
(Mikalef, Pappas, Krogstie & Pavlou, 2019). Let us take an example with healthcare. Suppose
clinicians are using a big data analytics tools to receive information from patients. It would be
important that such information flows in a quick manner for it may mean the difference
between life and death.
2.2.3 Value
The value is the worth and usefulness of the data collected. It is all well and good
having access to data but unless we can turn it to something useful it becomes useless.
Value can be seen as the ability to transform the vast amounts of data available into
business. After a business invests a great deal of time and resources into a business, big data
analytics could be used to help ensure that available data is used to know potential customers
and transform them into paying customers (Vidgen, Shaw & Grant, 2017). The big data can
layout and website features. Ecommerce businesses now use high end systems to collect.
Store and analyse the data to make real time decisions.
As Wamba, Gunasekaran, Akter, Ren, Dubey & Childeb (2017) argues, it is important
to note that velocity of information may be even more important than the volume. As Bichler,
Heinzl & van (2017) observes, when it comes to velocity, the main aim of an e-commerce
business should be to ensure that information is received and analysed quickly (as close to
real time as possible). Most times, it may be important to have less data being analysed at a
relatively high rate rather than have huge amounts of data that are being analysed at slow
speeds. Extensive evidence reveals that velocity is more important than volume since it gives
businesses a higher edge when it comes to competitive advantage (Laursen & Thorlund,
2016). It follows that an e-commerce business should maximize on ensuring that large
amounts of data can be handled in a relatively faster manner. However, where the business
has to choose between volume and velocity, it may be important to analyse small amounts of
data at a higher velocity rather than have large amounts of data which slows down velocity
(Mikalef, Pappas, Krogstie & Pavlou, 2019). Let us take an example with healthcare. Suppose
clinicians are using a big data analytics tools to receive information from patients. It would be
important that such information flows in a quick manner for it may mean the difference
between life and death.
2.2.3 Value
The value is the worth and usefulness of the data collected. It is all well and good
having access to data but unless we can turn it to something useful it becomes useless.
Value can be seen as the ability to transform the vast amounts of data available into
business. After a business invests a great deal of time and resources into a business, big data
analytics could be used to help ensure that available data is used to know potential customers
and transform them into paying customers (Vidgen, Shaw & Grant, 2017). The big data can
16
be used to understand the customers better and offer them what they need and at the right
time (Duan, Cao & Edwards, 2020). For instance, a business can be able to categorize
customers based on their behaviour. For instance, those customers who constantly cancel
their orders should be classified as less important to the business while e the loyal customers
should be identified and rewarded.
2.2.4 Variety
This refers to the different types of data been generated. It could be structured or
unstructured. This is a critical attribute in data analysis as data is been generated from
multiple sources and formats (Russom, 2011). This requires the use of different analytical and
predictive models which can enable information about different functional areas to be used.
For example, analytical models used by some e-commerce firms could comprise of customer
information, historical data of customer purchases or buying behavior, seasonal shopping
patterns and above all the retrieval of data from social media to make market predictions
(Biesdorf, et al., 2013)
Variety may also take the dimension of different sources of data. Different sources
here may include email marketing, social networks, mobile phones and websites. The
information received from different sources is differently of different nature. According to
Chambers (2018), it is important for a business to have a clear understanding of their most
important sources of data and put in mechanisms to analyse them. Let us take an example.
Suppose an e-commerce focuses on selling of movies and drama. Social media would be a
very important medium to collect and transfer information for such a company. Such a
company may also find email marketing useful. While analysing information from email
marketing, the focus should not be how many people have subscribed. Rather, it should be
who those people are. Analysis of emails may show that a vast number of people who have
subscribed to the video platform are youth. Such information could help the company to
focus more on the youth in advertisements and other endeavours.
be used to understand the customers better and offer them what they need and at the right
time (Duan, Cao & Edwards, 2020). For instance, a business can be able to categorize
customers based on their behaviour. For instance, those customers who constantly cancel
their orders should be classified as less important to the business while e the loyal customers
should be identified and rewarded.
2.2.4 Variety
This refers to the different types of data been generated. It could be structured or
unstructured. This is a critical attribute in data analysis as data is been generated from
multiple sources and formats (Russom, 2011). This requires the use of different analytical and
predictive models which can enable information about different functional areas to be used.
For example, analytical models used by some e-commerce firms could comprise of customer
information, historical data of customer purchases or buying behavior, seasonal shopping
patterns and above all the retrieval of data from social media to make market predictions
(Biesdorf, et al., 2013)
Variety may also take the dimension of different sources of data. Different sources
here may include email marketing, social networks, mobile phones and websites. The
information received from different sources is differently of different nature. According to
Chambers (2018), it is important for a business to have a clear understanding of their most
important sources of data and put in mechanisms to analyse them. Let us take an example.
Suppose an e-commerce focuses on selling of movies and drama. Social media would be a
very important medium to collect and transfer information for such a company. Such a
company may also find email marketing useful. While analysing information from email
marketing, the focus should not be how many people have subscribed. Rather, it should be
who those people are. Analysis of emails may show that a vast number of people who have
subscribed to the video platform are youth. Such information could help the company to
focus more on the youth in advertisements and other endeavours.
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
17
2.2.5 Veracity
This refers to the uncertainty in the data generated. This is so because data generated
can be messy making it not trustworthy. The data generated requires serious verification to
increase the rate of predictability in the E-commerce environment (Beuike, 2011). Data
verification is very essential in data analysis as bad data can hinder productive decision
making and this will have little or no relevance in adding value to the business. A form of
automation for verifying data so that it can be used for better decision making was suggested
by (Beuike, 2011)
However (Schroeck, et al., 2012) argued that data fusion could be used combines
various less reliable data sources in order to generate a more precise data.
It is important for a business to ensure that they are relying on meaningful data.
Cleanliness and accuracy of data is at the core of ensuring that the analysis carried out is
reliable (Cao, 2017). According to Menke (2018), there exists numerous various tools to help
ensure that the data being analysed is clean and accurate. With artificial intelligence and tools
such as natural language processing, it is possible to achieve this task in a smooth way.
Natural language processing toolkits provide the ability to break down data and identify
important aspects such the parts of speech used and even bias in different kinds of
information (Appelbaum, Kogan, Vasarhelyi & Yan, 2017). These models can be trained to
ensure that only data which meets certain criteria is analysed (Ashrafi, Ravasan, Trkman &
Afshari, 2019). With such form of analysis, it is possible to ensure that the data can be relied
upon. As seen in the previous sections, businesses use this kind of data to make important
decisions. It would be somewhat catastrophic for a business to make decisions using data that
is flawed. Therefore, veracity of data is very important. If businesses can be able to use big
2.2.5 Veracity
This refers to the uncertainty in the data generated. This is so because data generated
can be messy making it not trustworthy. The data generated requires serious verification to
increase the rate of predictability in the E-commerce environment (Beuike, 2011). Data
verification is very essential in data analysis as bad data can hinder productive decision
making and this will have little or no relevance in adding value to the business. A form of
automation for verifying data so that it can be used for better decision making was suggested
by (Beuike, 2011)
However (Schroeck, et al., 2012) argued that data fusion could be used combines
various less reliable data sources in order to generate a more precise data.
It is important for a business to ensure that they are relying on meaningful data.
Cleanliness and accuracy of data is at the core of ensuring that the analysis carried out is
reliable (Cao, 2017). According to Menke (2018), there exists numerous various tools to help
ensure that the data being analysed is clean and accurate. With artificial intelligence and tools
such as natural language processing, it is possible to achieve this task in a smooth way.
Natural language processing toolkits provide the ability to break down data and identify
important aspects such the parts of speech used and even bias in different kinds of
information (Appelbaum, Kogan, Vasarhelyi & Yan, 2017). These models can be trained to
ensure that only data which meets certain criteria is analysed (Ashrafi, Ravasan, Trkman &
Afshari, 2019). With such form of analysis, it is possible to ensure that the data can be relied
upon. As seen in the previous sections, businesses use this kind of data to make important
decisions. It would be somewhat catastrophic for a business to make decisions using data that
is flawed. Therefore, veracity of data is very important. If businesses can be able to use big
18
data analytics to ensure veracity and the other Vs explained above, then chances of business
success are only likely to increase.
2.3 Data Mining
As already defined in the previous chapter, data mining is a form of business
intelligence and data analysis. It is the process of digging into larger, unstructured data to get
useful correlations or predictions from it (Han & Kamber, 2011). David j. Hand (2012)
defined data mining as a secondary analysis discipline that interacts with statistics, database
technologies, pattern recognition, machine learning and secondary analysis of unpredictable
relationships in large databases.
Fig. 1. Data mining relationship (Hand, 1998)
Technological developments have made it easy for raw data to transform knowledge
to respond to management and organization marketing needs to create chances of return on
investments which have in turn forced businesses to invest in data mining (Shearer, 2000).
data analytics to ensure veracity and the other Vs explained above, then chances of business
success are only likely to increase.
2.3 Data Mining
As already defined in the previous chapter, data mining is a form of business
intelligence and data analysis. It is the process of digging into larger, unstructured data to get
useful correlations or predictions from it (Han & Kamber, 2011). David j. Hand (2012)
defined data mining as a secondary analysis discipline that interacts with statistics, database
technologies, pattern recognition, machine learning and secondary analysis of unpredictable
relationships in large databases.
Fig. 1. Data mining relationship (Hand, 1998)
Technological developments have made it easy for raw data to transform knowledge
to respond to management and organization marketing needs to create chances of return on
investments which have in turn forced businesses to invest in data mining (Shearer, 2000).
19
2.4 Data Mining Steps
FIG 2 (4)
2.4 Data Mining Steps
FIG 2 (4)
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
20
2.4.1 BUSINESS UNDERSTANDING
To achieve a proper plan for data mining. It is essential to understand what you are
trying to achieve with your business (Peterson, 2018). There are different kinds of
information that a business may want to mine depending on its nature. For instance, a
business may want to understand such aspects as the mean age of its loyal customers, gender
distribution of those customers and the geographical location. Such information could help a
business to make important decisions.
2.4.2 DATA UNDERSTANDING
First steps in understanding the data is collection, loading and integrating all the data
you have from all sources into one place. Ensure that the data collated is clean, coherent and
consolidated so as to get efficient mined results (Peterson, 2018). This step is very important
for it would not make any sense to analyse data that one cannot understand. It is actually
almost impossible.
2.4.3 DATA PREPARATION
This is one of the most time-consuming steps in data mining and requires a whole lot
of attention. At this stage the data is cleaned, extracted and formatted into a desirable format
so that it can be mined. The outcome of this step is the final data set to be used in the
modelling (Peterson, 2018).
Tools such as natural language toolkit can be used to help carry out this task. These
tools carry out such functions as stop word removal, part of speech tagging and other
important aspects involved in preparing data. The process of data preparation is very
important for it helps achieve reliable results (Chiang, Grover, Liang & Zhang, 2018).
2.4.1 BUSINESS UNDERSTANDING
To achieve a proper plan for data mining. It is essential to understand what you are
trying to achieve with your business (Peterson, 2018). There are different kinds of
information that a business may want to mine depending on its nature. For instance, a
business may want to understand such aspects as the mean age of its loyal customers, gender
distribution of those customers and the geographical location. Such information could help a
business to make important decisions.
2.4.2 DATA UNDERSTANDING
First steps in understanding the data is collection, loading and integrating all the data
you have from all sources into one place. Ensure that the data collated is clean, coherent and
consolidated so as to get efficient mined results (Peterson, 2018). This step is very important
for it would not make any sense to analyse data that one cannot understand. It is actually
almost impossible.
2.4.3 DATA PREPARATION
This is one of the most time-consuming steps in data mining and requires a whole lot
of attention. At this stage the data is cleaned, extracted and formatted into a desirable format
so that it can be mined. The outcome of this step is the final data set to be used in the
modelling (Peterson, 2018).
Tools such as natural language toolkit can be used to help carry out this task. These
tools carry out such functions as stop word removal, part of speech tagging and other
important aspects involved in preparing data. The process of data preparation is very
important for it helps achieve reliable results (Chiang, Grover, Liang & Zhang, 2018).
21
2.4.4 MODELING
In this stage the modelling techniques to be used is selected for the prepared data set.
Test scenarios will be created to test the validity of the chosen model (Peterson, 2018). There
are different models that can be used to make meaning of the available data. These tools
include the word embeddings, topic modelling tools and tools that help map word to vectors
(Seddon, Constantinidis, Tamm & Dod, 2017). Modelling is an important step in helping a
computer get the most meaning out of a corpus of data (Vershynin, 2018).
2.4.5 EVALUATION
After the modelling has been done and tests has been carried out, new patterns may
arise due to the data mining process and this may result in the initial business objectives
identified in the first step to be re-evaluated and revised. This is a continuous process as more
understanding of the business is gained through data mining (Peterson, 2018). Evaluation is
very important as it helps ensure efficiency and accuracy before deployment (Van 2016).
2.4.6 DEPLOYMENT
Business insights got after data is been mined is presented in such a way that stake
holders can effectively use the information gotten (Peterson, 2018).
This is a crucial step in the process and is also the last one. After data has been
prepared, important decision can be derived.
2.4.4 MODELING
In this stage the modelling techniques to be used is selected for the prepared data set.
Test scenarios will be created to test the validity of the chosen model (Peterson, 2018). There
are different models that can be used to make meaning of the available data. These tools
include the word embeddings, topic modelling tools and tools that help map word to vectors
(Seddon, Constantinidis, Tamm & Dod, 2017). Modelling is an important step in helping a
computer get the most meaning out of a corpus of data (Vershynin, 2018).
2.4.5 EVALUATION
After the modelling has been done and tests has been carried out, new patterns may
arise due to the data mining process and this may result in the initial business objectives
identified in the first step to be re-evaluated and revised. This is a continuous process as more
understanding of the business is gained through data mining (Peterson, 2018). Evaluation is
very important as it helps ensure efficiency and accuracy before deployment (Van 2016).
2.4.6 DEPLOYMENT
Business insights got after data is been mined is presented in such a way that stake
holders can effectively use the information gotten (Peterson, 2018).
This is a crucial step in the process and is also the last one. After data has been
prepared, important decision can be derived.
22
2.5 DATA MINING MODEL AND TECHNIQUES
There are several models and techniques in data mining. Most common techniques
and models are discussed below:
2.5.1 ASSOCIATION RULES
This data mining techniques helps in finding the association between two or more
items. It discovers hidden patterns in a data set. Association rule is similar to the notion of co-
occurrence in machine learning which means the likely hood of event is indicated by the
existence of the another. Also, the statistical concept of correlation is also similar to the
notion of association which means that the analysis of data shows that there is a relationship
between data events.
2.5.2 CLASSIFICATION
Classification is a supervised machine learning approach in which the program learns
from the inputted data to classify new observations. In classification pairs of inputs are taken
to predict an out for new observations based in training data sets.
Classification techniques can be used in the detection of inconsistencies in marketing
data. For example, a recent spike in the sales of toilet papers due the Corona Virus. These
type of data needs to be excluded from out trained while doing machine learning. So our
algorithm is on based on incorrect data.
2.5 DATA MINING MODEL AND TECHNIQUES
There are several models and techniques in data mining. Most common techniques
and models are discussed below:
2.5.1 ASSOCIATION RULES
This data mining techniques helps in finding the association between two or more
items. It discovers hidden patterns in a data set. Association rule is similar to the notion of co-
occurrence in machine learning which means the likely hood of event is indicated by the
existence of the another. Also, the statistical concept of correlation is also similar to the
notion of association which means that the analysis of data shows that there is a relationship
between data events.
2.5.2 CLASSIFICATION
Classification is a supervised machine learning approach in which the program learns
from the inputted data to classify new observations. In classification pairs of inputs are taken
to predict an out for new observations based in training data sets.
Classification techniques can be used in the detection of inconsistencies in marketing
data. For example, a recent spike in the sales of toilet papers due the Corona Virus. These
type of data needs to be excluded from out trained while doing machine learning. So our
algorithm is on based on incorrect data.
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
23
2.5.3 ARITIFICIAL NEURAL NETWORK
(ANN)
Artificial Neural network is one of the most used classification methods in machine
learning. ANN is exactly similar to the neuron work in our nervous system. Neural Networks
were invented in the 1970s and have achieved a large popularity because of the its
computation power. ANN is modelled after the human brain such that is able to learn from
data. Similarly, its able to learn from past data and provide responses inform of predictions or
classifications.
An advantage of ANN is that it learns from example data sets. Most common usage of
ANN is that of a function approximation. This tools allows one to have a cost effective
method in arriving at solutions that defines distribution. To implement a neural network a
sckit learn library on Python can be used.
ANN has three layers, the inner layer, the hidden layer and the out-layers. The inner
layer which is the first layer accepts the inputs in various forms and sends it to the hidden
layer which is at the middle, the hidden layer then performs various mathematical
computations on the inputted data and recognizes a pattern in them then sends to the output
layer .
Fig 3 Neural Network
2.5.3 ARITIFICIAL NEURAL NETWORK
(ANN)
Artificial Neural network is one of the most used classification methods in machine
learning. ANN is exactly similar to the neuron work in our nervous system. Neural Networks
were invented in the 1970s and have achieved a large popularity because of the its
computation power. ANN is modelled after the human brain such that is able to learn from
data. Similarly, its able to learn from past data and provide responses inform of predictions or
classifications.
An advantage of ANN is that it learns from example data sets. Most common usage of
ANN is that of a function approximation. This tools allows one to have a cost effective
method in arriving at solutions that defines distribution. To implement a neural network a
sckit learn library on Python can be used.
ANN has three layers, the inner layer, the hidden layer and the out-layers. The inner
layer which is the first layer accepts the inputs in various forms and sends it to the hidden
layer which is at the middle, the hidden layer then performs various mathematical
computations on the inputted data and recognizes a pattern in them then sends to the output
layer .
Fig 3 Neural Network
24
However, ANN has a disadvantage as there are no specific rules for defining a
structure hence, the best structure varies and will come from series of trial and errors.
2.5.4 DECISION TREE
Decision tree are decision support tool and one of the most popular classification
algorithms used in data mining that makes use of tree-like graph or model of decisions and
their possible consequences. Using decision tree on a given data set produces a set of rules
that can be used to classify the data. It can be used in applications for evaluation of brand
expansions using historical data. Determining the likely-hood of buyers buying a product
using demographic data to enable targeting of limited advertisement budgets
An advantage of decision tree is that it can handle both numerical and categorical data
also it is easy to understand and doesn’t require much knowledge of data preparation. A
disadvantage to this is that decision trees are unstable because a small variation in the data
might result in a completely different tree.
2.5.5 RANDOM FOREST
Random trees uses several decision trees on different sub data and uses the average
results in improving accuracy of the model. Random trees are more accurate than decision
trees in most cases. However, it is a difficult algorithm.
K-means algorithm is one of the simplest unsupervised learning algorithms that solves
the problems of clustering. The procedures classifies a data set through a certain number of
clusters in a very simple way. The main idea of K-means clustering is to define k centers for
one of each cluster. It’s mainly used in finding groups in datasets which have not been
labeled and to find patterns in making better decisions.
However, ANN has a disadvantage as there are no specific rules for defining a
structure hence, the best structure varies and will come from series of trial and errors.
2.5.4 DECISION TREE
Decision tree are decision support tool and one of the most popular classification
algorithms used in data mining that makes use of tree-like graph or model of decisions and
their possible consequences. Using decision tree on a given data set produces a set of rules
that can be used to classify the data. It can be used in applications for evaluation of brand
expansions using historical data. Determining the likely-hood of buyers buying a product
using demographic data to enable targeting of limited advertisement budgets
An advantage of decision tree is that it can handle both numerical and categorical data
also it is easy to understand and doesn’t require much knowledge of data preparation. A
disadvantage to this is that decision trees are unstable because a small variation in the data
might result in a completely different tree.
2.5.5 RANDOM FOREST
Random trees uses several decision trees on different sub data and uses the average
results in improving accuracy of the model. Random trees are more accurate than decision
trees in most cases. However, it is a difficult algorithm.
K-means algorithm is one of the simplest unsupervised learning algorithms that solves
the problems of clustering. The procedures classifies a data set through a certain number of
clusters in a very simple way. The main idea of K-means clustering is to define k centers for
one of each cluster. It’s mainly used in finding groups in datasets which have not been
labeled and to find patterns in making better decisions.
25
2.6 RELATED WORK
2.6.1 Market Basket Analysis
Data mining technique is market basket analysis. All information related to customers
and products when shopping are transferred to an electronic medium when sales are
processed. The data is usually collected at sales point which is called the market basket data.
Information data such as product id, transaction number, quantity and price are stored. The
purpose of this is find a relationship between sales and to come up with plans or rules related
to them. Knowing and understanding this relationship can be used to increase company
profits.
2.6.1.1 Sales Forecasting
Sales forecasting is used by retailers for inventory control to answer questions such as
when will customers shop again after their last shopping. Data mining in this case is used in
determining the customers shopping habits with varying price increases in the field of data
mining marketing to determine the relationships between cross sales analysis and product
2.6 RELATED WORK
2.6.1 Market Basket Analysis
Data mining technique is market basket analysis. All information related to customers
and products when shopping are transferred to an electronic medium when sales are
processed. The data is usually collected at sales point which is called the market basket data.
Information data such as product id, transaction number, quantity and price are stored. The
purpose of this is find a relationship between sales and to come up with plans or rules related
to them. Knowing and understanding this relationship can be used to increase company
profits.
2.6.1.1 Sales Forecasting
Sales forecasting is used by retailers for inventory control to answer questions such as
when will customers shop again after their last shopping. Data mining in this case is used in
determining the customers shopping habits with varying price increases in the field of data
mining marketing to determine the relationships between cross sales analysis and product
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
26
sales and their relationships. It also helps to determine what products customers are buying
according to the customer profile.
Sales forecasting is an important part of marketing planning because forecasting is
necessary to ensure that marketing decisions are effectively made. Sales forecasting form the
basis of marketing programs, budgeting, expense schedules and procurement plans.
2.6.1.2 Customer Profiling
Customer profiling also known as customer strategy allows e-commerce business use
customer data to plan business activities and operation through data mining, this in turn helps
the business develop new product and services to suite the customer needs. This tool allows
companies understand their customers habits while they are online to identify whether they
are shopping or just browsing through.
2.6.1.3 Click Stream Data
Click stream data basically means data from web, online advertisement and from
social media contents from ecommerce businesses. Online advertisements and social media
play an important role in the promotional strategy for businesses by using the click stream
data which is important in making informed strategic and tactical decisions. Studies have
shown that many commerce firms rely on this in their efforts to capture data. These data can
further help in predicting customer preferences and tastes. A research done by Davenport and
Harris Netflix captures and analyses about a billion-web data related to movies that are like
and disliked in other to understand what customers want.
sales and their relationships. It also helps to determine what products customers are buying
according to the customer profile.
Sales forecasting is an important part of marketing planning because forecasting is
necessary to ensure that marketing decisions are effectively made. Sales forecasting form the
basis of marketing programs, budgeting, expense schedules and procurement plans.
2.6.1.2 Customer Profiling
Customer profiling also known as customer strategy allows e-commerce business use
customer data to plan business activities and operation through data mining, this in turn helps
the business develop new product and services to suite the customer needs. This tool allows
companies understand their customers habits while they are online to identify whether they
are shopping or just browsing through.
2.6.1.3 Click Stream Data
Click stream data basically means data from web, online advertisement and from
social media contents from ecommerce businesses. Online advertisements and social media
play an important role in the promotional strategy for businesses by using the click stream
data which is important in making informed strategic and tactical decisions. Studies have
shown that many commerce firms rely on this in their efforts to capture data. These data can
further help in predicting customer preferences and tastes. A research done by Davenport and
Harris Netflix captures and analyses about a billion-web data related to movies that are like
and disliked in other to understand what customers want.
27
2.6.1.4: Use of Data Brain Boxes in the Past
Although data brain boxes are not a common phenomenon in the contemporary
society, there have been several cases where they have been used in the past. One great
example of a case where data brain boxes have been used is with the Alibaba e-commerce
website. Alibaba is an entirely online businesses that sells products to different parts of the
globe. Due to the huge business data that they have to deal with, Alibaba has over the years
developed a data brain box system that helps to manage the kind of data they deal with
(Yuan, 2018). The system is developed in such a way that different kind of analytics are made
with the data available. Let us look at some specific examples of how the complex data brain
system works. Suppose a customer visits the site and searches for a particular product, say a
laptop. The system has some algorithms help gather data across the website and present a
range of products that are related to the customer (Zhou, 2017). In addition, if the customer
does not make any purchase, the system can make ads that are specialized for that particular
potential customer and send the ad to the email or to the app that the customer is using to
access the site. In addition, the system is also able to monitor the kind of products that are in
high demand. The system can forecast the approximate number of products that will be
purchased which helps Alibaba personnel to prepare for the same. The systems can also
forecast the changes in sales that will be expected in the future. For instance, it may use
previous data to predict that certain items are more in demand during a certain time (such as
Easter). The information is used to prepare adequately for the future. The system handles
many other tasks. It is evident from the evidence above that the system is quite an advanced
data brain box system. It uses various algorithms most of which are developed using machine
learning. Artificial intelligence is changing the way businesses carry out their activities.
Data brain boxes have also been used by online major online sites such as You Tube.
In the last few years, You Tube has been improved with an intelligent data brain box system
(Real, Shlens, Mazzocchi, Pan & Vanhoucke, 2017). The system collects numerous amounts
2.6.1.4: Use of Data Brain Boxes in the Past
Although data brain boxes are not a common phenomenon in the contemporary
society, there have been several cases where they have been used in the past. One great
example of a case where data brain boxes have been used is with the Alibaba e-commerce
website. Alibaba is an entirely online businesses that sells products to different parts of the
globe. Due to the huge business data that they have to deal with, Alibaba has over the years
developed a data brain box system that helps to manage the kind of data they deal with
(Yuan, 2018). The system is developed in such a way that different kind of analytics are made
with the data available. Let us look at some specific examples of how the complex data brain
system works. Suppose a customer visits the site and searches for a particular product, say a
laptop. The system has some algorithms help gather data across the website and present a
range of products that are related to the customer (Zhou, 2017). In addition, if the customer
does not make any purchase, the system can make ads that are specialized for that particular
potential customer and send the ad to the email or to the app that the customer is using to
access the site. In addition, the system is also able to monitor the kind of products that are in
high demand. The system can forecast the approximate number of products that will be
purchased which helps Alibaba personnel to prepare for the same. The systems can also
forecast the changes in sales that will be expected in the future. For instance, it may use
previous data to predict that certain items are more in demand during a certain time (such as
Easter). The information is used to prepare adequately for the future. The system handles
many other tasks. It is evident from the evidence above that the system is quite an advanced
data brain box system. It uses various algorithms most of which are developed using machine
learning. Artificial intelligence is changing the way businesses carry out their activities.
Data brain boxes have also been used by online major online sites such as You Tube.
In the last few years, You Tube has been improved with an intelligent data brain box system
(Real, Shlens, Mazzocchi, Pan & Vanhoucke, 2017). The system collects numerous amounts
28
of data from users. The intelligent system then uses the data for various purposes. For
instance, it can be used to provide a customized play list to a user based on the content that
they have been watching. The system makes a user to better navigate to the videos that they
are interested in. In addition, the smart system uses data from subscriptions and watch history
to determine the kind of adverts to show to user. Here is an example. Suppose a user has been
searching about an online programming course. The system can use this data to bring the user
customized advertisements on the such courses. Therefore, the system becomes of importance
to the user while helping You Tube organization to increase revenue.
Amazon is a renowned e-commerce business. In fact, it is one of the world leaders of
e-commerce. The company has some well-established system for data management. This
system could be described as a data brain box because of some of the important features that
it has. Let us take an example with the chat bots that are integrated within the system. When a
customer visits the site and needs some help, they may well receive help from a chatbot. The
chatbot within the system are programmed to be intelligent and can carry out the role of
customer support. This system has been very handy in helping forecast sales. In addition, the
system is able to efficiently handle large amounts of data and help the company make
important predictions (Varambally, 2020). Data brain boxes have also been on the rise in
international hotels where they help project the number of customers to expect, the amounts
and how to prepare for the same. In some cases, these systems even help customers to make
bookings without the help of human resource.
3 CHAPTER 3
3.1 Evaluation Methodology
The main evaluation criteria to determine the effectiveness of our solution is as
follows:
of data from users. The intelligent system then uses the data for various purposes. For
instance, it can be used to provide a customized play list to a user based on the content that
they have been watching. The system makes a user to better navigate to the videos that they
are interested in. In addition, the smart system uses data from subscriptions and watch history
to determine the kind of adverts to show to user. Here is an example. Suppose a user has been
searching about an online programming course. The system can use this data to bring the user
customized advertisements on the such courses. Therefore, the system becomes of importance
to the user while helping You Tube organization to increase revenue.
Amazon is a renowned e-commerce business. In fact, it is one of the world leaders of
e-commerce. The company has some well-established system for data management. This
system could be described as a data brain box because of some of the important features that
it has. Let us take an example with the chat bots that are integrated within the system. When a
customer visits the site and needs some help, they may well receive help from a chatbot. The
chatbot within the system are programmed to be intelligent and can carry out the role of
customer support. This system has been very handy in helping forecast sales. In addition, the
system is able to efficiently handle large amounts of data and help the company make
important predictions (Varambally, 2020). Data brain boxes have also been on the rise in
international hotels where they help project the number of customers to expect, the amounts
and how to prepare for the same. In some cases, these systems even help customers to make
bookings without the help of human resource.
3 CHAPTER 3
3.1 Evaluation Methodology
The main evaluation criteria to determine the effectiveness of our solution is as
follows:
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
29
The system is able to successfully collect data from E-commerce websites (Magento
and WooCommerce)
The system is able to store the collected data collected on the cloud-based server –
Amazon Redshift
The system is able to give business analysis when quarried that is give insights on
what other shops are selling the most.
The system is able to provide predictive analysis of what sells the most at certain
seasons so as to allow Ecommerce site owners plan ahead.
3.2 Development Methodology
The development of the prototype employs a combination of agile and rapid
prototyping methodology for accomplishing the requirements mentioned above.
4 CHAPTER 4
4.1 Requirements Analysis
The outcome of this research is to create a prototype data brain box for ecommerce
companies in the life style sector which provides marketing insight for E-commerce
companies.
The system is able to successfully collect data from E-commerce websites (Magento
and WooCommerce)
The system is able to store the collected data collected on the cloud-based server –
Amazon Redshift
The system is able to give business analysis when quarried that is give insights on
what other shops are selling the most.
The system is able to provide predictive analysis of what sells the most at certain
seasons so as to allow Ecommerce site owners plan ahead.
3.2 Development Methodology
The development of the prototype employs a combination of agile and rapid
prototyping methodology for accomplishing the requirements mentioned above.
4 CHAPTER 4
4.1 Requirements Analysis
The outcome of this research is to create a prototype data brain box for ecommerce
companies in the life style sector which provides marketing insight for E-commerce
companies.
30
4.1.1 Functional Requirements
The data brain box is supposed to lead to an ability to analyse vast amounts of data for
e-commerce businesses. It should be compatible with operating systems and other computer
systems.
4.1.2 Data Collection –
Data for the proposed system will be gotten from people who shop frequently online
to be able to understand their shopping patterns and habits. Also Data will be collected from
different online E-commerce technologies running on Magento and WooCommerce.
A questionnaire will be designed to help collect important information. The questions
to be included in the questionnaire will revolve around peoples’ shopping habits of e-
commerce websites. Particularly, the questionnaire will focus on asking the aspects that
determine the buying habits of the customers.
4.1.1 Functional Requirements
The data brain box is supposed to lead to an ability to analyse vast amounts of data for
e-commerce businesses. It should be compatible with operating systems and other computer
systems.
4.1.2 Data Collection –
Data for the proposed system will be gotten from people who shop frequently online
to be able to understand their shopping patterns and habits. Also Data will be collected from
different online E-commerce technologies running on Magento and WooCommerce.
A questionnaire will be designed to help collect important information. The questions
to be included in the questionnaire will revolve around peoples’ shopping habits of e-
commerce websites. Particularly, the questionnaire will focus on asking the aspects that
determine the buying habits of the customers.
31
When it comes to the CMS and Magento Websites, special request will be made to
them to grand access to their customer service dashboard. Strict ethical issues such as
maintaining of privacy will be ensured.
4.1.3 Data Storage –
Data collected will be stored on Amazons Redshift cloud-based server, which is a fast
petabyte-scalable data warehouse cloud solution. There is a version of this service that is
availed freely for a period of two months. It is a powerful tool and two months will be more
than enough to collect appropriate data.
4.1.4 Data Analytics –
The system processes the data gotten from the sources mentioned, picks trends of the
consumers and classifies each trends for business intelligence.
4.1.5 Predictive Analysis –
The data collected from the stages mentioned about will then be used to predict
market analysis using machine learning techniques. This data will be modelled using a
machine language algorithm. In addition, predictions will be made through one of the
algorithms identified in the literature review. There are several algorithms that can be used to
make appropriate predictions.
4.2 Non-Functional Requirements
Runtime Performance – System response time should be fast.
Usability – The brain box should be very user friendly and easy to use.
When it comes to the CMS and Magento Websites, special request will be made to
them to grand access to their customer service dashboard. Strict ethical issues such as
maintaining of privacy will be ensured.
4.1.3 Data Storage –
Data collected will be stored on Amazons Redshift cloud-based server, which is a fast
petabyte-scalable data warehouse cloud solution. There is a version of this service that is
availed freely for a period of two months. It is a powerful tool and two months will be more
than enough to collect appropriate data.
4.1.4 Data Analytics –
The system processes the data gotten from the sources mentioned, picks trends of the
consumers and classifies each trends for business intelligence.
4.1.5 Predictive Analysis –
The data collected from the stages mentioned about will then be used to predict
market analysis using machine learning techniques. This data will be modelled using a
machine language algorithm. In addition, predictions will be made through one of the
algorithms identified in the literature review. There are several algorithms that can be used to
make appropriate predictions.
4.2 Non-Functional Requirements
Runtime Performance – System response time should be fast.
Usability – The brain box should be very user friendly and easy to use.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
32
Application should work perfectly on any browser.
Application should consume and require minimal data to work
5 CHAPTER 5
5.1 PROFESSIONAL, LEGAL AND ETHICAL
ISSUES
5.1.1 Professional Issues and Legal Issues
All, codes, libraries and papers that will be used in this project will be referenced and
used according to the terms and conditions of the publisher’s licenses. The prototype will be
developed, tested and documented according to professional product design practices. Any
external information will be cited and referenced accordingly.
5.1.2 Ethical Issues
This research involves the use of human subjects who will be take part in answering
questionnaires to gain knowledge of the shopping patterns and activities while shopping
online. All participants are aware that their personal data will not be shared at any point and
will be thanked for their effort at the end of answering the questionnaires. Therefore, there is
no risk of violating any ethical code.
Application should work perfectly on any browser.
Application should consume and require minimal data to work
5 CHAPTER 5
5.1 PROFESSIONAL, LEGAL AND ETHICAL
ISSUES
5.1.1 Professional Issues and Legal Issues
All, codes, libraries and papers that will be used in this project will be referenced and
used according to the terms and conditions of the publisher’s licenses. The prototype will be
developed, tested and documented according to professional product design practices. Any
external information will be cited and referenced accordingly.
5.1.2 Ethical Issues
This research involves the use of human subjects who will be take part in answering
questionnaires to gain knowledge of the shopping patterns and activities while shopping
online. All participants are aware that their personal data will not be shared at any point and
will be thanked for their effort at the end of answering the questionnaires. Therefore, there is
no risk of violating any ethical code.
33
6 CHAPTER 6
This chapter is concerned with the methodology. The steps that are going to be
followed to ensure that the project becomes a reality are going to be explained in this section.
The main goal is to explain the procedural aspects involved in ensuring that the idea of data
brain box moves from being just a concept to something practical. While the process may be
a lengthy one, it is achievable. While it may consume a great deal of time and resources, it
will come in handy for e-commerce businesses especially those that do not have an online
presence.
6.1 PROJECT PLAN
Planning is vital to ensuring that the data brain box project is actualised. To start with,
services of various professionals will be sought. The most important professionals to work
with in this case will be software engineers who have a specialization in artificial intelligence
and data analytics. It will also be important to ensure that a team of business managers and
chief executive officers are involved/ consulted. Since the data brain box is aimed at solving a
problem pertinent in e-commerce businesses, it will be important to involve some of the
mangers and chief executive officers of such businesses. In addition to this, it is also
important to ensure that all the laws concerning data analytics, both local and international,
are well researched so that they can be adhered to when coming up with the data brain
software. It is ethical to ensure that the software will not violate any rights, for example the
right to privacy. Therefore, it is important to carry out important planning to ensure that all
ethical standards are adhered to. Planning well will involve ensuring that all these measures
are adhered to.
6 CHAPTER 6
This chapter is concerned with the methodology. The steps that are going to be
followed to ensure that the project becomes a reality are going to be explained in this section.
The main goal is to explain the procedural aspects involved in ensuring that the idea of data
brain box moves from being just a concept to something practical. While the process may be
a lengthy one, it is achievable. While it may consume a great deal of time and resources, it
will come in handy for e-commerce businesses especially those that do not have an online
presence.
6.1 PROJECT PLAN
Planning is vital to ensuring that the data brain box project is actualised. To start with,
services of various professionals will be sought. The most important professionals to work
with in this case will be software engineers who have a specialization in artificial intelligence
and data analytics. It will also be important to ensure that a team of business managers and
chief executive officers are involved/ consulted. Since the data brain box is aimed at solving a
problem pertinent in e-commerce businesses, it will be important to involve some of the
mangers and chief executive officers of such businesses. In addition to this, it is also
important to ensure that all the laws concerning data analytics, both local and international,
are well researched so that they can be adhered to when coming up with the data brain
software. It is ethical to ensure that the software will not violate any rights, for example the
right to privacy. Therefore, it is important to carry out important planning to ensure that all
ethical standards are adhered to. Planning well will involve ensuring that all these measures
are adhered to.
34
6.2 METHODOLOGY
To actualize the idea of the data brain box, first there is need to establish a platform
that can be able to integrate data from different sources. The software platform should be able
to analyse data from sources such as emails, social media and information collected through a
business website. To come up with the platform, artificial intelligence and tools such as
natural language toolkit will be used. A corpus of data from different sources will be used to
train the data. Since the platform will be using neural networks, it will have the ability to
learn on its own and make appropriate deductions even without the need for prior training.
Therefore, once an e-commerce website starts using this platform, only important data
pertinent to that particular business will be analysed by the data brain box. Since the brain
box will be a cross platform software, it will be able to be used with such platforms as
Windows, apple and android. It will also have the ability to integrate ad analyse data from
different sources and provide important conclusions. The software will also help predict
important information such as the status of the stock market and other vital information that a
business may be interested in. To make matters even better, the software will have the ability
to be programmed further to meet any unique needs that a business may have.
6.3 RISK MANAGEMENT
It is important to predict potential risk and ensure there are counter measures to deal
with them in case they arise. The most pertinent risk to this software is that it is prone to
hacking. In the modern days, there are so many cases of cybercrimes. Even some of the best
software systems such as those of banks are prone to these crimes (Shmueli, Bruce, Yahav,
Patel, & Lichtendahl, 2017). One way to ensure that this risk is dealt is minimised is to build
a very strong system with no known loopholes. In addition, there will be constant monitoring
by cyber security experts to ensure that all is in place. Another way to ensure more protection
is to always use emerging technologies to upgrade the software.
6.2 METHODOLOGY
To actualize the idea of the data brain box, first there is need to establish a platform
that can be able to integrate data from different sources. The software platform should be able
to analyse data from sources such as emails, social media and information collected through a
business website. To come up with the platform, artificial intelligence and tools such as
natural language toolkit will be used. A corpus of data from different sources will be used to
train the data. Since the platform will be using neural networks, it will have the ability to
learn on its own and make appropriate deductions even without the need for prior training.
Therefore, once an e-commerce website starts using this platform, only important data
pertinent to that particular business will be analysed by the data brain box. Since the brain
box will be a cross platform software, it will be able to be used with such platforms as
Windows, apple and android. It will also have the ability to integrate ad analyse data from
different sources and provide important conclusions. The software will also help predict
important information such as the status of the stock market and other vital information that a
business may be interested in. To make matters even better, the software will have the ability
to be programmed further to meet any unique needs that a business may have.
6.3 RISK MANAGEMENT
It is important to predict potential risk and ensure there are counter measures to deal
with them in case they arise. The most pertinent risk to this software is that it is prone to
hacking. In the modern days, there are so many cases of cybercrimes. Even some of the best
software systems such as those of banks are prone to these crimes (Shmueli, Bruce, Yahav,
Patel, & Lichtendahl, 2017). One way to ensure that this risk is dealt is minimised is to build
a very strong system with no known loopholes. In addition, there will be constant monitoring
by cyber security experts to ensure that all is in place. Another way to ensure more protection
is to always use emerging technologies to upgrade the software.
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
35
Bibliography
Acemoglu, D., & Restrepo, P. (2018). Artificial intelligence, automation and work (No.
w24196). National Bureau of Economic Research.
Ahmed, E., Yaqoob, I., Hashem, I. A. T., Khan, I., Ahmed, A. I. A., Imran, M., & Vasilakos,
A. V. (2017). The role of big data analytics in Internet of Things. Computer
Networks, 129, 459-471.
Akter, S. & Wamba, S., 2016. Big Data Analytics in E-commerce: a systematic review and
agenda for future research. Electron Markets. [Online]
Alpaydin, E. (2020). Introduction to machine learning. MIT press.
Appelbaum, D., Kogan, A., Vasarhelyi, M., & Yan, Z. (2017). Impact of business analytics
and enterprise systems on managerial accounting. International Journal of Accounting
Information Systems, 25, 29-44.
Ashrafi, A., Ravasan, A. Z., Trkman, P., & Afshari, S. (2019). The role of business analytics
capabilities in bolstering firms’ agility and performance. International Journal of
Information Management, 47, 1-15.
Available at: https://doi.org/10.1007/s12525-016-0219-0 [Accessed 1 March 2020].
Available at: https://files.eric.ed.gov/fulltext/ED536788.pdf [Accessed 1 March 2020].
Aydiner, A. S., Tatoglu, E., Bayraktar, E., Zaim, S., & Delen, D. (2019). Business analytics
and firm performance: The mediating role of business process performance. Journal
of business research, 96, 228-237.
Bichler, M., Heinzl, A., & van der Aalst, W. M. (2017). Business analytics and data science:
once again?
Blum, A., Hopcroft, J., & Kannan, R. (2020). Foundations of data science. Cambridge
University Press.
Cao, L. (2017). Data science: a comprehensive overview. ACM Computing Surveys (CSUR),
50(3), 1-42.
Chambers, J. M. (2018). Graphical methods for data analysis. CRC Press.
Bibliography
Acemoglu, D., & Restrepo, P. (2018). Artificial intelligence, automation and work (No.
w24196). National Bureau of Economic Research.
Ahmed, E., Yaqoob, I., Hashem, I. A. T., Khan, I., Ahmed, A. I. A., Imran, M., & Vasilakos,
A. V. (2017). The role of big data analytics in Internet of Things. Computer
Networks, 129, 459-471.
Akter, S. & Wamba, S., 2016. Big Data Analytics in E-commerce: a systematic review and
agenda for future research. Electron Markets. [Online]
Alpaydin, E. (2020). Introduction to machine learning. MIT press.
Appelbaum, D., Kogan, A., Vasarhelyi, M., & Yan, Z. (2017). Impact of business analytics
and enterprise systems on managerial accounting. International Journal of Accounting
Information Systems, 25, 29-44.
Ashrafi, A., Ravasan, A. Z., Trkman, P., & Afshari, S. (2019). The role of business analytics
capabilities in bolstering firms’ agility and performance. International Journal of
Information Management, 47, 1-15.
Available at: https://doi.org/10.1007/s12525-016-0219-0 [Accessed 1 March 2020].
Available at: https://files.eric.ed.gov/fulltext/ED536788.pdf [Accessed 1 March 2020].
Aydiner, A. S., Tatoglu, E., Bayraktar, E., Zaim, S., & Delen, D. (2019). Business analytics
and firm performance: The mediating role of business process performance. Journal
of business research, 96, 228-237.
Bichler, M., Heinzl, A., & van der Aalst, W. M. (2017). Business analytics and data science:
once again?
Blum, A., Hopcroft, J., & Kannan, R. (2020). Foundations of data science. Cambridge
University Press.
Cao, L. (2017). Data science: a comprehensive overview. ACM Computing Surveys (CSUR),
50(3), 1-42.
Chambers, J. M. (2018). Graphical methods for data analysis. CRC Press.
36
Chiang, R. H., Grover, V., Liang, T. P., & Zhang, D. (2018). Strategic value of big data and
business analytics.
Dai, H. N., Wong, R. C. W., Wang, H., Zheng, Z., & Vasilakos, A. V. (2019). Big data
analytics for large-scale wireless networks: Challenges and opportunities. ACM
Computing Surveys (CSUR), 52(5), 1-36.
Donoho, D. (2017). 50 years of data science. Journal of Computational and Graphical
Statistics, 26(4), 745-766.
Duan, Y., Cao, G., & Edwards, J. S. (2020). Understanding the impact of business analytics
on innovation. European Journal of Operational Research, 281(3), 673-686.
Eldén, L. (2019). Matrix methods in data mining and pattern recognition (Vol. 15). Siam.
Gorunescu, F., 2011. Data Mining Concepts, Models and Techniques. s.l.: Springer Science
& Business Media.
Gupta, M., & George, J. F. (2016). Toward the development of a big data analytics
capability. Information & Management, 53(8), 1049-1064.
Han, J. & Kamber, M., 2011. Data Mining: Concepts and Techniques. San Francisco ed.s.l.:
Morgan-Kaufmann Academic Press.
Hand, D., 1998. Data Mining Statistics and More. s.l.: The American Statistician.
Hassabis, D., Kumaran, D., Summerfield, C., & Botvinick, M. (2017). Neuroscience-inspired
artificial intelligence. Neuron, 95(2), 245-258.
Hofmann, M., & Klinkenberg, R. (Eds.). (2016). RapidMiner: Data mining use cases and
business analytics applications. CRC Press.
Jackson, P. C. (2019). Introduction to artificial intelligence. Courier Dover Publications.
Kubick, W. 2012. Big Data, Information and Meaning In: Clinical Trial Insights. s.l.:s.n.
Laursen, G. H., & Thorlund, J. (2016). Business analytics for managers: Taking business
intelligence beyond reporting. John Wiley & Sons.
Leskovec, J., Rajaraman, A., & Ullman, J. D. (2020). Mining of massive data sets.
Cambridge university press.
Chiang, R. H., Grover, V., Liang, T. P., & Zhang, D. (2018). Strategic value of big data and
business analytics.
Dai, H. N., Wong, R. C. W., Wang, H., Zheng, Z., & Vasilakos, A. V. (2019). Big data
analytics for large-scale wireless networks: Challenges and opportunities. ACM
Computing Surveys (CSUR), 52(5), 1-36.
Donoho, D. (2017). 50 years of data science. Journal of Computational and Graphical
Statistics, 26(4), 745-766.
Duan, Y., Cao, G., & Edwards, J. S. (2020). Understanding the impact of business analytics
on innovation. European Journal of Operational Research, 281(3), 673-686.
Eldén, L. (2019). Matrix methods in data mining and pattern recognition (Vol. 15). Siam.
Gorunescu, F., 2011. Data Mining Concepts, Models and Techniques. s.l.: Springer Science
& Business Media.
Gupta, M., & George, J. F. (2016). Toward the development of a big data analytics
capability. Information & Management, 53(8), 1049-1064.
Han, J. & Kamber, M., 2011. Data Mining: Concepts and Techniques. San Francisco ed.s.l.:
Morgan-Kaufmann Academic Press.
Hand, D., 1998. Data Mining Statistics and More. s.l.: The American Statistician.
Hassabis, D., Kumaran, D., Summerfield, C., & Botvinick, M. (2017). Neuroscience-inspired
artificial intelligence. Neuron, 95(2), 245-258.
Hofmann, M., & Klinkenberg, R. (Eds.). (2016). RapidMiner: Data mining use cases and
business analytics applications. CRC Press.
Jackson, P. C. (2019). Introduction to artificial intelligence. Courier Dover Publications.
Kubick, W. 2012. Big Data, Information and Meaning In: Clinical Trial Insights. s.l.:s.n.
Laursen, G. H., & Thorlund, J. (2016). Business analytics for managers: Taking business
intelligence beyond reporting. John Wiley & Sons.
Leskovec, J., Rajaraman, A., & Ullman, J. D. (2020). Mining of massive data sets.
Cambridge university press.
37
Lowndes, J. S. S., Best, B. D., Scarborough, C., Afflerbach, J. C., Frazier, M. R., O’Hara, C.
C., & Halpern, B. S. (2017). Our path to better science in less time using open data
science tools. Nature ecology & evolution, 1(6), 1-7
Lu, H., Li, Y., Chen, M., Kim, H., & Serikawa, S. (2018). Brain intelligence: go beyond
artificial intelligence. Mobile Networks and Applications, 23(2), 368-375.
Marjani, M., Nasaruddin, F., Gani, A., Karim, A., Hashem, I. A. T., Siddiqa, A., & Yaqoob,
I. (2017). Big IoT data analytics: architecture, opportunities, and open research
challenges. IEEE Access, 5, 5247-5261.
McAfee, A. & Brynjolfsson, E., 2012. Big Data The management revolution. s.l.: Havard
Business Review.
Menke, W. (2018). Geophysical data analysis: Discrete inverse theory. Academic press.
Migrant & Seasonal Head Start Technical Assistant Center, n.d. Introduction to Data
Analysis Handbook. [Online]
Mikalef, P., Pappas, I. O., Krogstie, J., & Pavlou, P. A. (2019). Big data and business
analytics: A research agenda for realizing business value.
Mohri, M., Rostamizadeh, A., & Talwalkar, A. (2018). Foundations of machine learning.
MIT press.
Pappas, I. O., Mikalef, P., Giannakos, M. N., Krogstie, J., & Lekakos, G. (2018). Big data
and business analytics ecosystems: paving the way towards digital transformation and
sustainable societies.
Peterson, R., 2018. 6 essential steps in data mining process. s.l.:s.n.
Real, E., Shlens, J., Mazzocchi, S., Pan, X., & Vanhoucke, V. (2017). Youtube-
boundingboxes: A large high-precision human-annotated data set for object detection
in video. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (pp. 5296-5305).
Roiger, R. J. (2017). Data mining: a tutorial-based primer. CRC press.
Russom , P., 2011. Big Data Analytics. s.l.:TDWI Best Practices Report.
Lowndes, J. S. S., Best, B. D., Scarborough, C., Afflerbach, J. C., Frazier, M. R., O’Hara, C.
C., & Halpern, B. S. (2017). Our path to better science in less time using open data
science tools. Nature ecology & evolution, 1(6), 1-7
Lu, H., Li, Y., Chen, M., Kim, H., & Serikawa, S. (2018). Brain intelligence: go beyond
artificial intelligence. Mobile Networks and Applications, 23(2), 368-375.
Marjani, M., Nasaruddin, F., Gani, A., Karim, A., Hashem, I. A. T., Siddiqa, A., & Yaqoob,
I. (2017). Big IoT data analytics: architecture, opportunities, and open research
challenges. IEEE Access, 5, 5247-5261.
McAfee, A. & Brynjolfsson, E., 2012. Big Data The management revolution. s.l.: Havard
Business Review.
Menke, W. (2018). Geophysical data analysis: Discrete inverse theory. Academic press.
Migrant & Seasonal Head Start Technical Assistant Center, n.d. Introduction to Data
Analysis Handbook. [Online]
Mikalef, P., Pappas, I. O., Krogstie, J., & Pavlou, P. A. (2019). Big data and business
analytics: A research agenda for realizing business value.
Mohri, M., Rostamizadeh, A., & Talwalkar, A. (2018). Foundations of machine learning.
MIT press.
Pappas, I. O., Mikalef, P., Giannakos, M. N., Krogstie, J., & Lekakos, G. (2018). Big data
and business analytics ecosystems: paving the way towards digital transformation and
sustainable societies.
Peterson, R., 2018. 6 essential steps in data mining process. s.l.:s.n.
Real, E., Shlens, J., Mazzocchi, S., Pan, X., & Vanhoucke, V. (2017). Youtube-
boundingboxes: A large high-precision human-annotated data set for object detection
in video. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (pp. 5296-5305).
Roiger, R. J. (2017). Data mining: a tutorial-based primer. CRC press.
Russom , P., 2011. Big Data Analytics. s.l.:TDWI Best Practices Report.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
38
Seddon, P. B., Constantinidis, D., Tamm, T., & Dod, H. (2017). How does business analytics
contribute to business value? Information Systems Journal, 27(3), 237-269.
Shearer, C., 2000. The Crisp-dm model: The new blueprint for data mining. s.l.:J Data
Warehouse.
Shmueli, G., Bruce, P. C., Yahav, I., Patel, N. R., & Lichtendahl Jr, K. C. (2017). Data
mining for business analytics: concepts, techniques, and applications in R. John Wiley
& Sons.
Steels, L., & Brooks, R. (Eds.). (2018). The artificial life route to artificial intelligence:
Building embodied, situated agents. Routledge.
Tan, P. N., Steinbach, M., & Kumar, V. (2016). Introduction to data mining. Pearson
Education India.
Tegmark, M. (2017). Life 3.0: Being human in the age of artificial intelligence. Knopf.
Van Der Aalst, W. (2016). Data science in action. In Process mining (pp. 3-23). Springer,
Berlin, Heidelberg.
Varambally, K. V. M. (2020). “Sustainability and Amazon”–A Case Study on Amazon
Company. Our Heritage, 68(1), 12094-12100.
Vershynin, R. (2018). High-dimensional probability: An introduction with applications in
data science (Vol. 47). Cambridge university press.
Vidgen, R., Shaw, S., & Grant, D. B. (2017). Management challenges in creating value from
business analytics. European Journal of Operational Research, 261(2), 626-639.
Wamba, S. F., Gunasekaran, A., Akter, S., Ren, S. J. F., Dubey, R., & Childe, S. J. (2017).
Big data analytics and firm performance: Effects of dynamic capabilities. Journal of
Business Research, 70, 356-365.
Yuan, Y. (2018). Alibaba Group: Development and Influence (No. 2018-26-13).
Zhou, J. (2017). Big data analytics and intelligence at Alibaba cloud. In Proceedings of the
Twenty-Second International Conference on Architectural Support for Programming
Languages and Operating Systems.
Seddon, P. B., Constantinidis, D., Tamm, T., & Dod, H. (2017). How does business analytics
contribute to business value? Information Systems Journal, 27(3), 237-269.
Shearer, C., 2000. The Crisp-dm model: The new blueprint for data mining. s.l.:J Data
Warehouse.
Shmueli, G., Bruce, P. C., Yahav, I., Patel, N. R., & Lichtendahl Jr, K. C. (2017). Data
mining for business analytics: concepts, techniques, and applications in R. John Wiley
& Sons.
Steels, L., & Brooks, R. (Eds.). (2018). The artificial life route to artificial intelligence:
Building embodied, situated agents. Routledge.
Tan, P. N., Steinbach, M., & Kumar, V. (2016). Introduction to data mining. Pearson
Education India.
Tegmark, M. (2017). Life 3.0: Being human in the age of artificial intelligence. Knopf.
Van Der Aalst, W. (2016). Data science in action. In Process mining (pp. 3-23). Springer,
Berlin, Heidelberg.
Varambally, K. V. M. (2020). “Sustainability and Amazon”–A Case Study on Amazon
Company. Our Heritage, 68(1), 12094-12100.
Vershynin, R. (2018). High-dimensional probability: An introduction with applications in
data science (Vol. 47). Cambridge university press.
Vidgen, R., Shaw, S., & Grant, D. B. (2017). Management challenges in creating value from
business analytics. European Journal of Operational Research, 261(2), 626-639.
Wamba, S. F., Gunasekaran, A., Akter, S., Ren, S. J. F., Dubey, R., & Childe, S. J. (2017).
Big data analytics and firm performance: Effects of dynamic capabilities. Journal of
Business Research, 70, 356-365.
Yuan, Y. (2018). Alibaba Group: Development and Influence (No. 2018-26-13).
Zhou, J. (2017). Big data analytics and intelligence at Alibaba cloud. In Proceedings of the
Twenty-Second International Conference on Architectural Support for Programming
Languages and Operating Systems.
1 out of 38
Related Documents
Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
© 2024 | Zucol Services PVT LTD | All rights reserved.