Data Brain Box Project: Development for E-commerce Data Analysis

Verified

Added on  2022/09/18

|38
|11003
|26
Project
AI Summary
This project focuses on developing a 'data brain box' to aid e-commerce businesses in data analysis, visualization, and decision-making. It addresses the under-utilization of data in e-commerce, where vast amounts of customer and sales data often go unanalyzed. The project aims to create a system that collects data from various sources, transforms it into usable information, stores it efficiently, and provides visualizations and predictive analysis. Methodologies include expert consultations, the use of AI technologies like neural networks and natural language processing, and adherence to legal and ethical standards. The system will focus on functional requirements such as data collection, storage, and predictive analysis, as well as non-functional requirements. Risk management, particularly concerning cyber security, is also addressed through constant monitoring and security measures.
Document Page
1
Data Brain Box
Student’s Name
Institutional Affiliation
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
2
Abstract
Data analysis is an important component for e-commerce businesses. In the
contemporary business environment, businesses have to deal with vast amounts of data. To
help aid the process of analysis, visualizing and drawing important conclusions from business
data, this paper focuses of how to develop a data brain box that can be used for that purpose.
While there are several tools available for data analysis, the data brain box will be unique
since it will be used across different platforms and will have the ability to analyse data from
different sources. Big data analytics are very important in the modern e-commerce businesses
as they help collect and analyse data which can aid a business in making important decisions.
The amount of data that a business has to deal with has been growing exponentially in the last
few years and this trend is expected to continue. The methodology will involve seeking the
services and opinions of different experts including software engineers and business leaders.
The software will be made using some of the latest technologies in artificial intelligence such
as neural networks and natural language processing. All applicable local and international
laws will be looked into to ensure compliance. The potential risk of cybercrime will be
alleviated/ minimised through ensuring that no loopholes are left. In addition, the software
will be constantly monitored by a team of cyber security experts.
Document Page
3
Acknowledgement
Document Page
4
Contents
Abstract ........................................................................................................................................................... 2
Acknowledgement ............................................................................................................................................ 3
1.1 Introduction ......................................................................................................................................... 5
1.1.1 Overview of Data Mining .................................................................................................................... 6
1.1.2 Motivation ........................................................................................................................................... 6
1.1.3 Aim and Objectives ............................................................................................................................. 7
1.2 Report Structure................................................................................................................................... 7
2 CHAPTER 2 ............................................................................................................................................ 8
2.1 Literature Review ................................................................................................................................ 8
2.2 Big Data Analytics ............................................................................................................................. 10
2.2.1 Volume.............................................................................................................................................. 13
2.2.2 Velocity ............................................................................................................................................. 14
2.2.3 Value ................................................................................................................................................. 15
2.2.4 Variety .............................................................................................................................................. 16
2.2.5 Veracity ............................................................................................................................................. 17
2.3 Data Mining ...................................................................................................................................... 18
2.4 Data Mining Steps ............................................................................................................................. 19
2.4.1 BUSINESS UNDERSTANDING ...................................................................................................... 20
2.4.2 DATA UNDERSTANDING .............................................................................................................. 20
2.4.3 DATA PREPARATION .................................................................................................................... 20
2.4.4 MODELING.......................................................................................................................................... 21
2.4.5 EVALUATION ................................................................................................................................. 21
2.4.6 DEPLOYMENT ................................................................................................................................ 21
2.5 DATA MINING MODEL AND TECHNIQUES ................................................................................ 22
2.5.1 ASSOCIATION RULES.................................................................................................................... 22
2.5.2 CLASSIFICATION ........................................................................................................................... 22
2.5.3 ARITIFICIAL NEURAL NETWORK (ANN).................................................................................... 23
2.5.4 DECISION TREE .............................................................................................................................. 24
2.5.5 RANDOM FOREST .......................................................................................................................... 24
2.6 RELATED WORK ............................................................................................................................ 25
2.6.1 Market Basket Analysis ..................................................................................................................... 25
2.6.1.1 Sales Forecasting ........................................................................................................................... 25
2.6.1.2 Customer Profiling ........................................................................................................................ 26
2.6.1.3 Click Stream Data .......................................................................................................................... 26
2.6.1.4: Use of Data Brain Boxes in the Past ................................................................................................... 27
3 CHAPTER 3 .......................................................................................................................................... 28
3.1 Evaluation Methodology .................................................................................................................... 28
3.2 Development Methodology ................................................................................................................ 29
4 CHAPTER 4 .......................................................................................................................................... 29
4.1 Requirements Analysis ...................................................................................................................... 29
4.1.1 Functional Requirements.................................................................................................................... 30
4.1.2 Data Collection – ............................................................................................................................... 30
4.1.3 Data Storage – ................................................................................................................................... 31
4.1.4 Data Analytics – ................................................................................................................................ 31
4.1.5 Predictive Analysis – ......................................................................................................................... 31
4.2 Non-Functional Requirements ............................................................................................................ 31
5 CHAPTER 5 .......................................................................................................................................... 32
5.1 PROFESSIONAL, LEGAL AND ETHICAL ISSUES ........................................................................ 32
5.1.1 Professional Issues and Legal Issues ................................................................................................... 32
5.1.2 Ethical Issues ..................................................................................................................................... 32
6 CHAPTER 6 .......................................................................................................................................... 33
6.1 PROJECT PLAN ...................................................................................................................................... 33
6.2 METHODOLOGY ................................................................................................................................... 34
6.3 RISK MANAGEMENT ............................................................................................................................ 34
Bibliography .................................................................................................................................................. 35
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
5
1.1 Introduction
Businesses are happy when they have more data about their businesses, the wants of
their customers and mostly importantly results of strategy implementation. However, when
they have this data, they may not know exactly what do with it. The inability of businesses
not knowing how to utilize data can lead to loss of revenue opportunities, lower productivity,
effectiveness and quality issues.
This thesis discusses the under-utilization of data in businesses such as E-commerce
and how this under-utilized data can be processed into something useful.
Ecommerce businesses obtains a lot of information about their customers, data is
obtained whenever purchases are made or whenever products are viewed on a website. Over
the past years, there has been an increase in the need of data in the E-commerce industry.
This is due to the fact that E-commerce companies that are data driven experiences a higher
level of productivity than their competitors (McAfee & Brynjolfsson, 2012).
A recent study carried out by BSA Software Alliance shows that Data analysis
contributes to 15% or more of the growth for 56% of firms. Therefore 91% of fortune 1000
companies are investing in data analysis projects, an 85% increase from the previous years
(Akter & Wamba, 2016). While at the same time, the use of internet-based technologies
provides e-commerce companies with transformative benefits such as real-time customer
service, pricing options or personalized offers. However, Data mining helps solidify these
benefits by providing informed decisions based on critical insights and allows the companies
use data more efficiently to drive a higher conversion rate by customers.
It is very important Ecommerce businesses to have smart way of getting business
insights for what consumers want to see when their site is visited in order to get the best out
of their business. The objective is to develop a data brain box that provides data collection,
data transformation, data storage and visualization.
Document Page
6
1.1.1 Overview of Data Mining
In the 1973 Webster’s New Collegiate Dictionary data is defined as “factual
information used as a basis for reasoning, discussion, or calculation.” The 1996 Version of
the Webster Dictionary defined data as “information, especially information organized for
analysis (Migrant & Seasonal Head Start Technical Assistant Center, n.d.).
From the definitions above, a more practical way of defining data is that data is a
collection of numbers, characters, images or other method of recording, in a form which can
be assessed to make a decision about a specific action. By closely analysing data we can find
patterns to perceive information which can be used to enhance knowledge (Migrant &
Seasonal Head Start Technical Assistant Center, n.d.).
Data mining is therefore a form of business intelligence and data analysis. It is the
process of digging into larger, unstructured data to get useful correlations or predictions from
it (Han & Kamber, 2011).
1.1.2 Motivation
Being a product designer and having worked with several startup businesses in
Ecommerce industry. It has been realized from my experience over the years that most
Ecommerce companies have no idea of what to do after they have their website or
applications developed asides the upload of products and selling to the few consumers, they
have access to. Some don’t even know the true value of the data the get from there sales. So,
the motivation for this thesis is to bridge the gap of the under-utilization.
Document Page
7
1.1.3 Aim and Objectives
The aim of this dissertation is to investigate some effective ways in which businesses
can utilize available data to increase sales and return on investment. Core to this investigation
will be data mining techniques and various algorithms that could help achieve the task
mentioned above. These algorithms include but are not limited to neural networks, decision
trees and machine learning. This paper also aims to develop a prototype for a data drain box
that could help e-commerce businesses collect relevant data and utilize it to the advantage of
the business.
1.2 Report Structure
My report will be outlined as follows.
Chapter 2- This chapter comprises of the literature review, which gives a summary of
various algorithms and technologies on data mining, data warehousing, data visualization and
predictive analysis
Chapter 3- This chapter identifies the requirements analysis of the project
Chapter 4- This chapter project implementation and evaluation
Chapter 5- This chapter describes the professional, legal, ethical and social issues that
can be associated with the project
Chapter 6- This chapter provides the project plan of the project
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
8
2 CHAPTER 2
2.1 Literature Review
This chapter provides a literature on data mining and predictive analysis for business
marketing data. We will introduce some of the core techniques, concepts and solutions for
data mining in order to meet the aims and objectives of this project. In the contemporary
society, technology has been integrated into almost all facets of our lives. Businesses have not
been left behind. Businesses form a significant number of organizations that exist in the
modern day (Lowndes et al, 2017). With technology being advanced now more than have
ever been observed in history, it is very important for businesses to take advantage of these
technologies to increase their sales and consequently maximize on their profits (Gupta &
George, 2016). Let us take a simple example. Consider the number of people who use
smartphones. There are several billion such people in the world. These people are most likely
to search for the product or services that they need online. Businesses can tap into such an
opportunity for their own advantage. It is worth noting that many businesses have not
invested in data mining and data analysis (Tan, Steinbach & Kumar, 2016). If businesses
could tap into the field of data collection, data analysis and use the data to make important
predictions, the chances of business success are increased. Considering the highly
competitive nature of businesses in the modern world, it is only wise for businesses to
consider venturing into data collection and analysis. It is for this reason that this paper aims to
investigate ways in which e-commerce businesses can tap in to the huge amounts of data that
exist, make sense of this data and use it to make important predictions and decisions
concerning their businesses. There exists extensive evidence to show that businesses with
effective social media marketing are more likely to succeed compared to their counterparts
who have not invested in this kind of marketing (Jackson, 2019). It would be important for e-
commerce businesses to consider having a heavy social media presence (Dai, Wong, Wang,
Zheng & Vasilakos, 2019). In fact, it would be appropriate for them to consider hiring a
Document Page
9
social media marketing team. This team should focus on the integration of social media sites
into the e-commerce websites. The team should also be tasked with the responsibility of
uploading appropriate information and responding to any queries or issues that potential
clients may have (Eldén, 2019)
The main goal of such a team would be to ensure that it uses social media to convert
potential customers into buying customers. In addition to carrying out the tasks described
above, the team should also carry out data analytics on depending on factors such as traffic,
age and location of potential customers. Here is an example of how these data analytics may
work. Suppose the team discovers that most of the people who are buying from the business
are of a certain age group. Based on that data, the business may dedicate more resources
targeting that particular age group. Such a move is likely to result into more sales for the
business since the most appropriate group is targeted.
Email marketing is another tool that can be integrated into e-commerce websites and
help collect appropriate data about the customers (Steels & Brooks, 2018). With a tool that
manages mails, the business can be able to send promotional messages to appropriate
customers. E-mail marketing may provide very unique kind of data to the business and may
help increase sales (Hassabis, Kumaran, Summerfield, & Botvinick, 2017). Let us take some
very specific example concerning email marketing. Suppose a customer visits an e-commerce
site, places and order for an item and subscribes to the mailing list. The email can be used to
update the customer on the status of their order right from when they purchase the item to
when the item is shipped. If the same customer becomes a regular customer, e-mail marketing
data can be used to notice this. The business may use such information to offer incentives to
the loyal customer. For instance, an email may be sent to the customer offering them a 10
percent reduction in price the next time they buy and item with the business.
In a nutshell, there are numerous ways in which businesses can collect appropriate
data, analyse that data, visualize it and use it to make important business decisions. Therefore,
it is absolutely important for business to tap into the tools that exist for making such
Document Page
10
important moves. The information provided in this section show that there is a great need for
business to have appropriate tools to help them manage available data in a way that helps
achieve business goals. It follows that the idea of data brain box is particularly made to help
businesses achieve this, is not only a great one but one that is vital in the modern economy.
We are in the age of artificial intelligence. Therefore, important tools such as machine
learning and neural networks can help create data mining software that are faster and more
effective. The following section will look into more literature. Literature is important as it
helps look into what has already been done, the loopholes that may exist and what could be
done better (Blum, Hopcroft & Kannan, 2020).
Data brain boxes were not a common phenomenon with the 4th generation of
computers and the previous versions (Alpaydin, 2020). They have become more common
with the 5th generation, that is the knowledge-based system. It is estimated that artificial
intelligence which will be an integral part of the fourth industrial revolution will see an
exponential rise in data brain boxes and related technology. Knowledge based systems which
are essential in making effective data brain boxes are going to power the fourth industrial
revolution (Mohri, Rostamizadeh & Talwalkar, 018). As seen in the discussion above,
businesses are already making use of this important technology to make important business
decisions.
2.2 Big Data Analytics
The term “Big Data” refers to large datasets. These are data sets so large to work with
using the traditional database management systems. The datasets are usually very large which
makes it difficult for commonly used software tools and storage devices to capture, manage
and store data. Because of the complexity and volume of these data it takes a longer time for
analysis (kubick, 2012).
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
11
Everyday there is an exponential increase in the amount of data collected by
businesses ranging from dozens of terabytes (TB) to many petabytes (PB) of data in a dataset.
Currently some of the problems with this volume of data includes, capturing, storing,
searching, sharing, analytics and visualizing. Today businesses are exploring volumes of data
so as to discover knowledge to grow their businesses (Russom, 2011).
From the many problems created by Big Data, one of the biggest problems is the
spread of volumes of data across different application in a business organization. These
spread of information is not very useful but if the data is merged and processed a new dataset
can be created which will bring value to the business. In order to get value from this
tremendous amount of data, it is characterized using the five V’s of big data.
When managed well, big data analytics may open immense opportunities for an e-
commerce business. There exist modern techniques of helping make important data analysis
on relatively large data sets. With these technologies, it is possible to analyse the vast amount
of data and draw important conclusions that can help business organizations make well
informed decisions. Big data analytics may be viewed as form of quite advanced analysis
involving complex applications statistical algorithms and elements such as predictive models
(Roiger, 2017). These analytics also perform what-if analysis and are powered by what is
referred to as high performance analytics systems.
There are numerous advantages that can be accrued by an e-commerce business from
using big data. Some of these advantages will be explained here. As already observed these
analytics are used to handle large data sets whose utilization can help a business utilize new
opportunities. The analytic may uncover phenomenal opportunities that were never thought to
exist before (Marjani, Nasaruddin, Gani, Karim, Hashem, Siddiqa & Yaqoob, 2017). It
follows that if business want increased profits, improved operations and happy customers, big
data analytics is the way to go. Businesses need these analytics because the amount of
available data is very larger and continues to grow exponentially each day (Lu, Li, Chen, Kim
& Serikawa, 2018). It is important for business to have an idea of the kind of data that is
Document Page
12
generated through it. If this information is not analysed, it gets wasted denying the business
some highly valuable data (Pappas, Mikalef, Giannakos, Krogstie & Lekakos, 2018). To
make matters better for businesses, some great tools exist to help analyse this data (Leskovec,
Rajaraman & Ullman, 2020). Even where these tools may not exist, they can always be
developed. In the past, businesses had to hire a whole team if they wanted to carry out some
analytics. In the modern days however, modern software carries out these tasks in a highly
reliable manner. The modern software is also fast.
Big data analytics may help an e-commerce business gain a deeper understanding of
the market. With the high-speed memory in these analytics and with the ability to analyse
data in real time, important information about market can be availed to the business almost
instantly. The market is an important component of any business. Therefore, having a tool
that helps provide appropriate information about the market is a great win for e-commerce
businesses. With appropriate market information, these businesses are able to deliver
products more efficiently. In addition, it becomes possible to manage deadlines with a lot of
ease.
Big data analytics can help the business gain a good understanding about the industry.
Since these analytics have the ability to comprehend industry knowledge, they can provide
information to help a business make important decisions about the future (Acemoglu &
Restrepo, 2018). In addition, the analytics can provide information on the kind of economy
available. Information on the kind of economy can help a business in its expansion plans.
Such expansion not only helps the business to row but also to build a very strong brand.
Although the economy is constantly changing and there is need for business to continuously
adapt to various environments, the main goal for any business remains to be profit
maximization. Big data analytics helps provide refined information from data sets which
helps a business focus on the areas that maximize profits.
In the light of the observations made above, there is no doubt that big data analytics
are very essential to an e-commerce business. It would be true to conclude that big data
chevron_up_icon
1 out of 38
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]