2 Abstract Data analysis is an important component for e-commerce businesses. In the contemporary business environment, businesses have to deal with vast amounts of data. To help aid the process of analysis, visualizing and drawing important conclusions from business data, this paper focuses of how to develop a data brain box that can be used for that purpose. While there are several tools available for data analysis, the data brain box will be unique since it will be used across different platforms and will have the ability to analyse data from different sources. Big data analytics are very important in the modern e-commerce businesses as they help collect and analyse data which can aid a business in making important decisions. The amount of data that a business has to deal with has been growing exponentially in the last few years and this trend is expected to continue. The methodology will involve seeking the services and opinions of different experts including software engineers and business leaders. The software will be made using some of the latest technologies in artificial intelligence such as neural networks and natural language processing. All applicable local and international laws will be looked into to ensure compliance. The potential risk of cybercrime will be alleviated/ minimised through ensuring that no loopholes are left. In addition, the software will be constantly monitored by a team of cyber security experts.
5 1.1 Introduction Businesses are happy when they have more data about their businesses, the wants of their customers and mostly importantly results of strategy implementation. However, when they have this data, they may not know exactly what do with it. The inability of businesses not knowing how to utilize data can lead to loss of revenue opportunities, lower productivity, effectiveness and quality issues. This thesis discusses the under-utilization of data in businesses such as E-commerce and how this under-utilized data can be processed into something useful. Ecommerce businesses obtains a lot of information about their customers, data is obtained whenever purchases are made or whenever products are viewed on a website. Over the past years, there has been an increase in the need of data in the E-commerce industry. This is due to the fact that E-commerce companies that are data driven experiences a higher level of productivity than their competitors (McAfee & Brynjolfsson, 2012). A recent study carried out by BSA Software Alliance shows that Data analysis contributes to 15% or more of the growth for 56% of firms. Therefore 91% of fortune 1000 companies are investing in data analysis projects, an 85% increase from the previous years (Akter & Wamba, 2016). While at the same time, the use of internet-based technologies provides e-commerce companies with transformative benefits such as real-time customer service, pricing options or personalized offers. However, Data mining helps solidify these benefits by providing informed decisions based on critical insights and allows the companies use data more efficiently to drive a higher conversion rate by customers. It is very important Ecommerce businesses to have smart way of getting business insights for what consumers want to see when their site is visited in order to get the best out of their business. The objective is to develop a data brain box that provides data collection, data transformation, data storage and visualization.
6 1.1.1Overview of Data Mining In the 1973 Webster’s New Collegiate Dictionary data is defined as “factual information used as a basis for reasoning, discussion, or calculation.” The 1996 Version of the Webster Dictionary defined data as “information, especially information organized for analysis (Migrant & Seasonal Head Start Technical Assistant Center, n.d.). From the definitions above, a more practical way of defining data is that data is a collection of numbers, characters, images or other method of recording, in a form which can be assessed to make a decision about a specific action. By closely analysing data we can find patterns to perceive information which can be used to enhance knowledge (Migrant & Seasonal Head Start Technical Assistant Center, n.d.). Data mining is therefore a form of business intelligence and data analysis. It is the process of digging into larger, unstructured data to get useful correlations or predictions from it (Han & Kamber, 2011). 1.1.2Motivation Being a product designer and having worked with several startup businesses in Ecommerce industry. It has been realized from my experience over the years that most Ecommerce companies have no idea of what to do after they have their website or applications developed asides the upload of products and selling to the few consumers, they have access to. Some don’t even know the true value of the data the get from there sales. So, the motivation for this thesis is to bridge the gap of the under-utilization.
7 1.1.3Aim and Objectives The aim of this dissertation is to investigate some effective ways in which businesses can utilize available data to increase sales and return on investment. Core to this investigation will be data mining techniques and various algorithms that could help achieve the task mentioned above. These algorithms include but are not limited to neural networks, decision trees and machine learning. This paper also aims to develop a prototype for a data drain box that could help e-commerce businesses collect relevant data and utilize it to the advantage of the business. 1.2 Report Structure My report will be outlined as follows. Chapter 2- This chapter comprises of the literature review, which gives a summary of various algorithms and technologies on data mining, data warehousing, data visualization and predictive analysis Chapter 3- This chapter identifies the requirements analysis of the project Chapter 4- This chapter project implementation and evaluation Chapter 5- This chapter describes the professional, legal, ethical and social issues that can be associated with the project Chapter 6- This chapter provides the project plan of the project
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
8 2CHAPTER 2 2.1 Literature Review This chapter provides a literature on data mining and predictive analysis for business marketing data. We will introduce some of the core techniques, concepts and solutions for data mining in order to meet the aims and objectives of this project. In the contemporary society, technology has been integrated into almost all facets of our lives. Businesses have not been left behind. Businesses form a significant number of organizations that exist in the modern day (Lowndes et al, 2017). With technology being advanced now more than have ever been observed in history, it is very important for businesses to take advantage of these technologies to increase their sales and consequently maximize on their profits (Gupta & George, 2016). Let us take a simple example. Consider the number of people who use smartphones. There are several billion such people in the world. These people are most likely to search for the product or services that they need online. Businesses can tap into such an opportunity for their own advantage. It is worth noting that many businesses have not invested in data mining and data analysis (Tan, Steinbach & Kumar, 2016). If businesses could tap into the field of data collection, data analysis and use the data to make important predictions, the chances of business success are increased. Considering the highly competitive nature of businesses in the modern world, it is only wise for businesses to consider venturing into data collection and analysis. It is for this reason that this paper aims to investigate ways in which e-commerce businesses can tap in to the huge amounts of data that exist, make sense of this data and use it to make important predictions and decisions concerning their businesses. There exists extensive evidence to show that businesses with effective social media marketing are more likely to succeed compared to their counterparts who have not invested in this kind of marketing (Jackson, 2019). It would be important for e- commerce businesses to consider having a heavy social media presence (Dai, Wong, Wang, Zheng & Vasilakos, 2019). In fact, it would be appropriate for them to consider hiring a
9 social media marketing team. This team should focus on the integration of social media sites into the e-commerce websites. The team should also be tasked with the responsibility of uploading appropriate information and responding to any queries or issues that potential clients may have (Eldén, 2019) The main goal of such a team would be to ensure that it uses social media to convert potential customers into buying customers. In addition to carrying out the tasks described above, the team should also carry out data analytics on depending on factors such as traffic, age and location of potential customers. Here is an example of how these data analytics may work. Suppose the team discovers that most of the people who are buying from the business are of a certain age group. Based on that data, the business may dedicate more resources targeting that particular age group. Such a move is likely to result into more sales for the business since the most appropriate group is targeted. Email marketing is another tool that can be integrated into e-commerce websites and help collect appropriate data about the customers (Steels & Brooks, 2018). With a tool that manages mails, the business can be able to send promotional messages to appropriate customers. E-mail marketing may provide very unique kind of data to the business and may help increase sales (Hassabis, Kumaran, Summerfield, & Botvinick, 2017). Let us take some very specific example concerning email marketing. Suppose a customer visits an e-commerce site, places and order for an item and subscribes to the mailing list. The email can be used to update the customer on the status of their order right from when they purchase the item to when the item is shipped. If the same customer becomes a regular customer, e-mail marketing data can be used to notice this. The business may use such information to offer incentives to the loyal customer. For instance, an email may be sent to the customer offering them a 10 percent reduction in price the next time they buy and item with the business. In a nutshell, there are numerous ways in which businesses can collect appropriate data, analyse that data, visualize it and use it to make important business decisions. Therefore, it is absolutely important for business to tap into the tools that exist for making such
10 important moves. The information provided in this section show that there is a great need for business to have appropriate tools to help them manage available data in a way that helps achieve business goals. It follows that the idea of data brain box is particularly made to help businesses achieve this, is not only a great one but one that is vital in the modern economy. We are in the age of artificial intelligence. Therefore, important tools such as machine learning and neural networks can help create data mining software that are faster and more effective. The following section will look into more literature. Literature is important as it helps look into what has already been done, the loopholes that may exist and what could be done better (Blum, Hopcroft & Kannan, 2020). Data brain boxes were not a common phenomenon with the 4thgeneration of computers and the previous versions (Alpaydin, 2020). They have become more common with the 5thgeneration, that is the knowledge-based system. It is estimated that artificial intelligence which will be an integral part of the fourth industrial revolution will see an exponential rise in data brain boxes and related technology. Knowledge based systems which are essential in making effective data brain boxes are going to power the fourth industrial revolution (Mohri, Rostamizadeh & Talwalkar, 018). As seen in the discussion above, businesses are already making use of this important technology to make important business decisions. 2.2 Big Data Analytics The term “Big Data” refers to large datasets. These are data sets so large to work with using the traditional database management systems. The datasets are usually very large which makes it difficult for commonly used software tools and storage devices to capture, manage and store data. Because of the complexity and volume of these data it takes a longer time for analysis (kubick, 2012).
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
11 Everyday there is an exponential increase in the amount of data collected by businesses ranging from dozens of terabytes (TB) to many petabytes (PB) of data in a dataset. Currently some of the problems with this volume of data includes, capturing, storing, searching, sharing, analytics and visualizing. Today businesses are exploring volumes of data so as to discover knowledge to grow their businesses (Russom, 2011). From the many problems created by Big Data, one of the biggest problems is the spread of volumes of data across different application in a business organization. These spread of information is not very useful but if the data is merged and processed a new dataset can be created which will bring value to the business. In order to get value from this tremendous amount of data, it is characterized using the five V’s of big data. When managed well, big data analytics may open immense opportunities for an e- commerce business. There exist modern techniques of helping make important data analysis on relatively large data sets. With these technologies, it is possible to analyse the vast amount of data and draw important conclusions that can help business organizations make well informed decisions. Big data analytics may be viewed as form of quite advanced analysis involving complex applications statistical algorithms and elements such as predictive models (Roiger, 2017). These analytics also perform what-if analysis and are powered by what is referred to as high performance analytics systems. There are numerous advantages that can be accrued by an e-commerce business from using big data. Some of these advantages will be explained here. As already observed these analytics are used to handle large data sets whose utilization can help a business utilize new opportunities. The analytic may uncover phenomenal opportunities that were never thought to exist before (Marjani, Nasaruddin, Gani, Karim, Hashem, Siddiqa & Yaqoob, 2017). It follows that if business want increased profits, improved operations and happy customers, big data analytics is the way to go. Businesses need these analytics because the amount of available data is very larger and continues to grow exponentially each day (Lu, Li, Chen, Kim & Serikawa, 2018). It is important for business to have an idea of the kind of data that is
12 generated through it. If this information is not analysed, it gets wasted denying the business some highly valuable data (Pappas, Mikalef, Giannakos, Krogstie & Lekakos, 2018). To make matters better for businesses, some great tools exist to help analyse this data (Leskovec, Rajaraman & Ullman, 2020). Even where these tools may not exist, they can always be developed. In the past, businesses had to hire a whole team if they wanted to carry out some analytics. In the modern days however, modern software carries out these tasks in a highly reliable manner. The modern software is also fast. Big data analytics may help an e-commerce business gain a deeper understanding of the market. With the high-speed memory in these analytics and with the ability to analyse data in real time, important information about market can be availed to the business almost instantly. The market is an important component of any business. Therefore, having a tool that helps provide appropriate information about the market is a great win for e-commerce businesses. With appropriate market information, these businesses are able to deliver products more efficiently. In addition, it becomes possible to manage deadlines with a lot of ease. Big data analytics can help the business gain a good understanding about the industry. Since these analytics have the ability to comprehend industry knowledge, they can provide information to help a business make important decisions about the future (Acemoglu & Restrepo, 2018). In addition, the analytics can provide information on the kind of economy available. Information on the kind of economy can help a business in its expansion plans. Such expansion not only helps the business to row but also to build a very strong brand. Although the economy is constantly changing and there is need for business to continuously adapt to various environments, the main goal for any business remains to be profit maximization. Big data analytics helps provide refined information from data sets which helps a business focus on the areas that maximize profits. In the light of the observations made above, there is no doubt that big data analytics are very essential to an e-commerce business. It would be true to conclude that big data
13 analytics remains to be one of the most important tools that businesses can use to remain relevant in a world that is constantly changing. In almost all facets of life, data analytics have changed the way people used to behave. Changes in the lives of people translate to changes in way of doing business. Otherwise, if a business is resistant to change it risks being phased out. Big data analytics can be described as the new age of data and it comes with unlimited potential for businesses. Going into the future, it will be very important for businesses to make constant use of the available tools to maximize on profits and make important business decisions. Some of the most successful businesses invest heavily in use of big data analytics. Such businesses include Amazon and Google. The following section will dive into the five Vs of big data analytics to help provide some important insights into what it is and why it is important. 2.2.1Volume The volume of data represents the most immediate challenge to the conventional business. Many businesses already have large amounts of saved data, but do not have a means to process these data into something meaningful. (Russom, 2011) illustrated that big data analysis take large volumes of data usually expressed in petabytes and exabytes which are used for making strategic marketing decisions. Data generated are often unstructured which usually includes videos, images or data from mobile technologies. It is unlikely that big data will be clean and free from errors. As this may pose some challenges in data preparations big data enables real time decision making for ecommerce firms (Kang, et al., 2003). For example, Amazon developed a sophisticated recommendation engine that delivers over 35% of sales, automated customer service systems that ensures superior customer satisfaction and dynamic pricing systems that adjust pricing against competing sites every 15 seconds (Goff, et al., 2012).
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
14 There exists extensive evidence to show that the amount of data that companies manage has been increasing exponentially. As Ahmed et al (2017) observes, it is estimated that since the year 2012, the amount of data that businesses have to deal with has been doubling every 40months. Consequently, it is important for businesses to have some effective way of analysing this immense volume of data. As noted above, this data is not without errors or mistakes. However, big data analytics tools offer ways of pre-processing data and ensuring that most of these errors are handled effectively (Aydiner, Tatoglu, Bayraktar, Zaim & Delen, 2019). The good thing with these analytics is that they can be able to handle immense volumes of data in a matter of nanoseconds (Tegmark, 2017). In the modern days, cloud storage has come to the aid of storing such data (Donoho, 2017). It could be quite overwhelming and costly for businesses to store vast amounts of data using traditional means such as hard disk drives or even the solid-state drives (Hofmann & Klinkenberg, 2016). It would not be effective or even sustainable. It follows that businesses need tools to analyse large volumes of data in a way that is efficient and cost effective. The best solution in as far as that is concerned is the big data analytical tools. 2.2.2Velocity Velocity is the increasing speed at which data is created, that is, the speed at which new data is generated, processed, stored and analysed. The velocity of big data needs to periodized and synced into business processes, decision making and improvements in performance (Beuike, 2011). Capitalizing on this high rate of data processing, many e-commerce firms different techniques in adding value to their businesses. For example, eBay has performed several experiments using data velocity with different aspects of it website, which resulted in better
15 layout and website features. Ecommerce businesses now use high end systems to collect. Store and analyse the data to make real time decisions. As Wamba, Gunasekaran, Akter, Ren, Dubey & Childeb (2017) argues, it is important to note that velocity of information may be even more important than the volume. As Bichler, Heinzl & van (2017) observes, when it comes to velocity, the main aim of an e-commerce business should be to ensure that information is received and analysed quickly (as close to real time as possible). Most times, it may be important to have less data being analysed at a relatively high rate rather than have huge amounts of data that are being analysed at slow speeds. Extensive evidence reveals that velocity is more important than volume since it gives businesses a higher edge when it comes to competitive advantage (Laursen & Thorlund, 2016). It follows that an e-commerce business should maximize on ensuring that large amounts of data can be handled in a relatively faster manner. However, where the business has to choose between volume and velocity, it may be important to analyse small amounts of data at a higher velocity rather than have large amounts of data which slows down velocity (Mikalef, Pappas, Krogstie & Pavlou, 2019). Let us take an example with healthcare. Suppose clinicians are using a big data analytics tools to receive information from patients. It would be important that such information flows in a quick manner for it may mean the difference between life and death. 2.2.3Value The value is the worth and usefulness of the data collected. It is all well and good having access to data but unless we can turn it to something useful it becomes useless. Value can be seen as the ability to transform the vast amounts of data available into business. After a business invests a great deal of time and resources into a business, big data analytics could be used to help ensure that available data is used to know potential customers and transform them into paying customers (Vidgen, Shaw & Grant, 2017). The big data can
16 be used to understand the customers better and offer them what they need and at the right time (Duan, Cao & Edwards, 2020). For instance, a business can be able to categorize customers based on their behaviour. For instance, those customers who constantly cancel their orders should be classified as less important to the business while e the loyal customers should be identified and rewarded. 2.2.4Variety This refers to the different types of data been generated. It could be structured or unstructured. This is a critical attribute in data analysis as data is been generated from multiple sources and formats (Russom, 2011). This requires the use of different analytical and predictive models which can enable information about different functional areas to be used. For example, analytical models used by some e-commerce firms could comprise of customer information, historical data of customer purchases or buying behavior, seasonal shopping patterns and above all the retrieval of data from social media to make market predictions (Biesdorf, et al., 2013) Variety may also take the dimension of different sources of data. Different sources here may include email marketing, social networks, mobile phones and websites. The information received from different sources is differently of different nature. According to Chambers (2018), it is important for a business to have a clear understanding of their most important sources of data and put in mechanisms to analyse them. Let us take an example. Suppose an e-commerce focuses on selling of movies and drama. Social media would be a very important medium to collect and transfer information for such a company. Such a company may also find email marketing useful. While analysing information from email marketing, the focus should not be how many people have subscribed. Rather, it should be who those people are. Analysis of emails may show that a vast number of people who have subscribed to the video platform are youth. Such information could help the company to focus more on the youth in advertisements and other endeavours.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
17 2.2.5Veracity This refers to the uncertainty in the data generated. This is so because data generated can be messy making it not trustworthy. The data generated requires serious verification to increase the rate of predictability in the E-commerce environment (Beuike, 2011). Data verification is very essential in data analysis as bad data can hinder productive decision making and this will have little or no relevance in adding value to the business. A form of automation for verifying data so that it can be used for better decision making was suggested by (Beuike, 2011) However (Schroeck, et al., 2012) argued that data fusion could be used combines various less reliable data sources in order to generate a more precise data. It is important for a business to ensure that they are relying on meaningful data. Cleanliness and accuracy of data is at the core of ensuring that the analysis carried out is reliable (Cao, 2017). According to Menke (2018), there exists numerous various tools to help ensure that the data being analysed is clean and accurate. With artificial intelligence and tools such as natural language processing, it is possible to achieve this task in a smooth way. Natural language processing toolkits provide the ability to break down data and identify important aspects such the parts of speech used and even bias in different kinds of information (Appelbaum, Kogan, Vasarhelyi & Yan, 2017). These models can be trained to ensure that only data which meets certain criteria is analysed (Ashrafi, Ravasan, Trkman & Afshari, 2019). With such form of analysis, it is possible to ensure that the data can be relied upon. As seen in the previous sections, businesses use this kind of data to make important decisions. It would be somewhat catastrophic for a business to make decisions using data that is flawed. Therefore, veracity of data is very important. If businesses can be able to use big
18 data analytics to ensure veracity and the other Vs explained above, then chances of business success are only likely to increase. 2.3 Data Mining As already defined in the previous chapter, data mining is a form of business intelligence and data analysis. It is the process of digging into larger, unstructured data to get useful correlations or predictions from it (Han & Kamber, 2011). David j. Hand (2012) defined data mining as a secondary analysis discipline that interacts with statistics, database technologies, pattern recognition, machine learning and secondary analysis of unpredictable relationships in large databases. Fig. 1. Data mining relationship (Hand, 1998) Technological developments have made it easy for raw data to transform knowledge to respond to management and organization marketing needs to create chances of return on investments which have in turn forced businesses to invest in data mining (Shearer, 2000).
19 2.4 Data Mining Steps FIG 2 (4)
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
20 2.4.1BUSINESS UNDERSTANDING To achieve a proper plan for data mining. It is essential to understand what you are trying to achieve with your business (Peterson, 2018). There are different kinds of information that a business may want to mine depending on its nature. For instance, a business may want to understand such aspects as the mean age of its loyal customers, gender distribution of those customers and the geographical location. Such information could help a business to make important decisions. 2.4.2DATA UNDERSTANDING First steps in understanding the data is collection, loading and integrating all the data you have from all sources into one place. Ensure that the data collated is clean, coherent and consolidated so as to get efficient mined results (Peterson, 2018). This step is very important for it would not make any sense to analyse data that one cannot understand. It is actually almost impossible. 2.4.3DATA PREPARATION This is one of the most time-consuming steps in data mining and requires a whole lot of attention. At this stage the data is cleaned, extracted and formatted into a desirable format so that it can be mined. The outcome of this step is the final data set to be used in the modelling (Peterson, 2018). Tools such as natural language toolkit can be used to help carry out this task. These tools carry out such functions as stop word removal, part of speech tagging and other important aspects involved in preparing data. The process of data preparation is very important for it helps achieve reliable results (Chiang, Grover, Liang & Zhang, 2018).
21 2.4.4 MODELING In this stage the modelling techniques to be used is selected for the prepared data set. Test scenarios will be created to test the validity of the chosen model (Peterson, 2018). There are different models that can be used to make meaning of the available data. These tools include the word embeddings, topic modelling tools and tools that help map word to vectors (Seddon, Constantinidis, Tamm & Dod, 2017). Modelling is an important step in helping a computer get the most meaning out of a corpus of data (Vershynin, 2018). 2.4.5EVALUATION After the modelling has been done and tests has been carried out, new patterns may arise due to the data mining process and this may result in the initial business objectives identified in the first step to be re-evaluated and revised. This is a continuous process as more understanding of the business is gained through data mining (Peterson, 2018). Evaluation is very important as it helps ensure efficiency and accuracy before deployment (Van 2016). 2.4.6DEPLOYMENT Business insights got after data is been mined is presented in such a way that stake holders can effectively use the information gotten (Peterson, 2018). This is a crucial step in the process and is also the last one. After data has been prepared, important decision can be derived.
22 2.5 DATA MINING MODEL AND TECHNIQUES There are several models and techniques in data mining. Most common techniques and models are discussed below: 2.5.1ASSOCIATION RULES This data mining techniques helps in finding the association between two or more items. It discovers hidden patterns in a data set. Association rule is similar to the notion of co- occurrence in machine learning which means the likely hood of event is indicated by the existence of the another. Also, the statistical concept of correlation is also similar to the notion of association which means that the analysis of data shows that there is a relationship between data events. 2.5.2CLASSIFICATION Classification is a supervised machine learning approach in which the program learns from the inputted data to classify new observations. In classification pairs of inputs are taken to predict an out for new observations based in training data sets. Classification techniques can be used in the detection of inconsistencies in marketing data. For example, a recent spike in the sales of toilet papers due the Corona Virus. These type of data needs to be excluded from out trained while doing machine learning. So our algorithm is on based on incorrect data.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
23 2.5.3ARITIFICIAL NEURAL NETWORK (ANN) Artificial Neural network is one of the most used classification methods in machine learning. ANN is exactly similar to the neuron work in our nervous system. Neural Networks were invented in the 1970s and have achieved a large popularity because of the its computation power. ANN is modelled after the human brain such that is able to learn from data. Similarly, its able to learn from past data and provide responses inform of predictions or classifications. An advantage of ANN is that it learns from example data sets. Most common usage of ANN is that of a function approximation. This tools allows one to have a cost effective method in arriving at solutions that defines distribution. To implement a neural network a sckit learn library on Python can be used. ANN has three layers, the inner layer, the hidden layer and the out-layers. The inner layer which is the first layer accepts the inputs in various forms and sends it to the hidden layer which is at the middle, the hidden layer then performs various mathematical computations on the inputted data and recognizes a pattern in them then sends to the output layer . Fig 3 Neural Network
24 However, ANN has a disadvantage as there are no specific rules for defining a structure hence, the best structure varies and will come from series of trial and errors. 2.5.4DECISION TREE Decision tree are decision support tool and one of the most popular classification algorithms used in data mining that makes use of tree-like graph or model of decisions and their possible consequences. Using decision tree on a given data set produces a set of rules that can be used to classify the data. It can be used in applications for evaluation of brand expansions using historical data. Determining the likely-hood of buyers buying a product using demographic data to enable targeting of limited advertisement budgets An advantage of decision tree is that it can handle both numerical and categorical data also it is easy to understand and doesn’t require much knowledge of data preparation. A disadvantage to this is that decision trees are unstable because a small variation in the data might result in a completely different tree. 2.5.5RANDOM FOREST Random trees uses several decision trees on different sub data and uses the average results in improving accuracy of the model. Random trees are more accurate than decision trees in most cases. However, it is a difficult algorithm. K-means algorithm is one of the simplest unsupervised learning algorithms that solves the problems of clustering. The procedures classifies a data set through a certain number of clusters in a very simple way. The main idea of K-means clustering is to define k centers for one of each cluster. It’s mainly used in finding groups in datasets which have not been labeled and to find patterns in making better decisions.
25 2.6 RELATED WORK 2.6.1Market Basket Analysis Data mining technique is market basket analysis. All information related to customers and products when shopping are transferred to an electronic medium when sales are processed. The data is usually collected at sales point which is called the market basket data. Information data such as product id, transaction number, quantity and price are stored. The purpose of this is find a relationship between sales and to come up with plans or rules related to them. Knowing and understanding this relationship can be used to increase company profits. 2.6.1.1Sales Forecasting Sales forecasting is used by retailers for inventory control to answer questions such as when will customers shop again after their last shopping. Data mining in this case is used in determining the customers shopping habits with varying price increases in the field of data mining marketing to determine the relationships between cross sales analysis and product
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
26 sales and their relationships. It also helps to determine what products customers are buying according to the customer profile. Sales forecasting is an important part of marketing planning because forecasting is necessary to ensure that marketing decisions are effectively made. Sales forecasting form the basis of marketing programs, budgeting, expense schedules and procurement plans. 2.6.1.2Customer Profiling Customer profiling also known as customer strategy allows e-commerce business use customer data to plan business activities and operation through data mining, this in turn helps the business develop new product and services to suite the customer needs. This tool allows companies understand their customers habits while they are online to identify whether they are shopping or just browsing through. 2.6.1.3Click Stream Data Click stream data basically means data from web, online advertisement and from social media contents from ecommerce businesses. Online advertisements and social media play an important role in the promotional strategy for businesses by using the click stream data which is important in making informed strategic and tactical decisions. Studies have shown that many commerce firms rely on this in their efforts to capture data. These data can further help in predicting customer preferences and tastes. A research done by Davenport and Harris Netflix captures and analyses about a billion-web data related to movies that are like and disliked in other to understand what customers want.
27 2.6.1.4: Use of Data Brain Boxes in the Past Although data brain boxes are not a common phenomenon in the contemporary society, there have been several cases where they have been used in the past. One great example of a case where data brain boxes have been used is with the Alibaba e-commerce website. Alibaba is an entirely online businesses that sells products to different parts of the globe. Due to the huge business data that they have to deal with, Alibaba has over the years developed a data brain box system that helps to manage the kind of data they deal with (Yuan, 2018). The system is developed in such a way that different kind of analytics are made with the data available. Let us look at some specific examples of how the complex data brain system works. Suppose a customer visits the site and searches for a particular product, say a laptop. The system has some algorithms help gather data across the website and present a range of products that are related to the customer (Zhou, 2017). In addition, if the customer does not make any purchase, the system can make ads that are specialized for that particular potential customer and send the ad to the email or to the app that the customer is using to access the site. In addition, the system is also able to monitor the kind of products that are in high demand. The system can forecast the approximate number of products that will be purchased which helps Alibaba personnel to prepare for the same. The systems can also forecast the changes in sales that will be expected in the future. For instance, it may use previous data to predict that certain items are more in demand during a certain time (such as Easter). The information is used to prepare adequately for the future. The system handles many other tasks. It is evident from the evidence above that the system is quite an advanced data brain box system. It uses various algorithms most of which are developed using machine learning. Artificial intelligence is changing the way businesses carry out their activities. Data brain boxes have also been used by online major online sites such as You Tube. In the last few years, You Tube has been improved with an intelligent data brain box system (Real, Shlens, Mazzocchi, Pan & Vanhoucke, 2017). The system collects numerous amounts
28 of data from users. The intelligent system then uses the data for various purposes. For instance, it can be used to provide a customized play list to a user based on the content that they have been watching. The system makes a user to better navigate to the videos that they are interested in. In addition, the smart system uses data from subscriptions and watch history to determine the kind of adverts to show to user. Here is an example. Suppose a user has been searching about an online programming course. The system can use this data to bring the user customized advertisements on the such courses. Therefore, the system becomes of importance to the user while helping You Tube organization to increase revenue. Amazon is a renowned e-commerce business. In fact, it is one of the world leaders of e-commerce. The company has some well-established system for data management. This system could be described as a data brain box because of some of the important features that it has. Let us take an example with the chat bots that are integrated within the system. When a customer visits the site and needs some help, they may well receive help from a chatbot. The chatbot within the system are programmed to be intelligent and can carry out the role of customer support. This system has been very handy in helping forecast sales. In addition, the system is able to efficiently handle large amounts of data and help the company make important predictions (Varambally, 2020). Data brain boxes have also been on the rise in international hotels where they help project the number of customers to expect, the amounts and how to prepare for the same. In some cases, these systems even help customers to make bookings without the help of human resource. 3CHAPTER 3 3.1 Evaluation Methodology The main evaluation criteria to determine the effectiveness of our solution is as follows:
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
29 The system is able to successfully collect data from E-commerce websites (Magento and WooCommerce) The system is able to store the collected data collected on the cloud-based server – Amazon Redshift The system is able to give business analysis when quarried that is give insights on what other shops are selling the most. The system is able to provide predictive analysis of what sells the most at certain seasons so as to allow Ecommerce site owners plan ahead. 3.2 Development Methodology The development of the prototype employs a combination of agile and rapid prototyping methodology for accomplishing the requirements mentioned above. 4CHAPTER 4 4.1 Requirements Analysis The outcome of this research is to create a prototype data brain box for ecommerce companies in the life style sector which provides marketing insight for E-commerce companies.
30 4.1.1Functional Requirements The data brain box is supposed to lead to an ability to analyse vast amounts of data for e-commerce businesses. It should be compatible with operating systems and other computer systems. 4.1.2Data Collection – Data for the proposed system will be gotten from people who shop frequently online to be able to understand their shopping patterns and habits. Also Data will be collected from different online E-commerce technologies running on Magento and WooCommerce. A questionnaire will be designed to help collect important information. The questions to be included in the questionnaire will revolve around peoples’ shopping habits of e- commerce websites. Particularly, the questionnaire will focus on asking the aspects that determine the buying habits of the customers.
31 When it comes to the CMS and Magento Websites, special request will be made to them to grand access to their customer service dashboard. Strict ethical issues such as maintaining of privacy will be ensured. 4.1.3Data Storage – Data collected will be stored on Amazons Redshift cloud-based server, which is a fast petabyte-scalable data warehouse cloud solution. There is a version of this service that is availed freely for a period of two months. It is a powerful tool and two months will be more than enough to collect appropriate data. 4.1.4Data Analytics – The system processes the data gotten from the sources mentioned, picks trends of the consumers and classifies each trends for business intelligence. 4.1.5Predictive Analysis – The data collected from the stages mentioned about will then be used to predict market analysis using machine learning techniques. This data will be modelled using a machine language algorithm. In addition, predictions will be made through one of the algorithms identified in the literature review. There are several algorithms that can be used to make appropriate predictions. 4.2 Non-Functional Requirements Runtime Performance – System response time should be fast. Usability – The brain box should be very user friendly and easy to use.
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
32 Application should work perfectly on any browser. Application should consume and require minimal data to work 5CHAPTER 5 5.1 PROFESSIONAL, LEGAL AND ETHICAL ISSUES 5.1.1Professional Issues and Legal Issues All, codes, libraries and papers that will be used in this project will be referenced and used according to the terms and conditions of the publisher’s licenses. The prototype will be developed, tested and documented according to professional product design practices. Any external information will be cited and referenced accordingly. 5.1.2Ethical Issues This research involves the use of human subjects who will be take part in answering questionnaires to gain knowledge of the shopping patterns and activities while shopping online. All participants are aware that their personal data will not be shared at any point and will be thanked for their effort at the end of answering the questionnaires. Therefore, there is no risk of violating any ethical code.
33 6CHAPTER 6 This chapter is concerned with the methodology. The steps that are going to be followed to ensure that the project becomes a reality are going to be explained in this section. The main goal is to explain the procedural aspects involved in ensuring that the idea of data brain box moves from being just a concept to something practical. While the process may be a lengthy one, it is achievable. While it may consume a great deal of time and resources, it will come in handy for e-commerce businesses especially those that do not have an online presence. 6.1 PROJECT PLAN Planning is vital to ensuring that the data brain box project is actualised. To start with, services of various professionals will be sought. The most important professionals to work with in this case will be software engineers who have a specialization in artificial intelligence and data analytics. It will also be important to ensure that a team of business managers and chief executive officers are involved/ consulted. Since the data brain box is aimed at solving a problem pertinent in e-commerce businesses, it will be important to involve some of the mangers and chief executive officers of such businesses. In addition to this, it is also important to ensure that all the laws concerning data analytics, both local and international, are well researched so that they can be adhered to when coming up with the data brain software. It is ethical to ensure that the software will not violate any rights, for example the right to privacy. Therefore, it is important to carry out important planning to ensure that all ethical standards are adhered to. Planning well will involve ensuring that all these measures are adhered to.
34 6.2 METHODOLOGY To actualize the idea of the data brain box, first there is need to establish a platform that can be able to integrate data from different sources. The software platform should be able to analyse data from sources such as emails, social media and information collected through a business website. To come up with the platform, artificial intelligence and tools such as natural language toolkit will be used. A corpus of data from different sources will be used to train the data. Since the platform will be using neural networks, it will have the ability to learn on its own and make appropriate deductions even without the need for prior training. Therefore, once an e-commerce website starts using this platform, only important data pertinent to that particular business will be analysed by the data brain box. Since the brain box will be a cross platform software, it will be able to be used with such platforms as Windows, apple and android. It will also have the ability to integrate ad analyse data from different sources and provide important conclusions. The software will also help predict important information such as the status of the stock market and other vital information that a business may be interested in. To make matters even better, the software will have the ability to be programmed further to meet any unique needs that a business may have. 6.3 RISK MANAGEMENT It is important to predict potential risk and ensure there are counter measures to deal with them in case they arise. The most pertinent risk to this software is that it is prone to hacking. In the modern days, there are so many cases of cybercrimes. Even some of the best software systems such as those of banks are prone to these crimes (Shmueli, Bruce, Yahav, Patel, & Lichtendahl, 2017). One way to ensure that this risk is dealt is minimised is to build a very strong system with no known loopholes. In addition, there will be constant monitoring by cyber security experts to ensure that all is in place. Another way to ensure more protection is to always use emerging technologies to upgrade the software.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
35 Bibliography Acemoglu, D., & Restrepo, P. (2018). Artificial intelligence, automation and work (No. w24196). National Bureau of Economic Research. Ahmed, E., Yaqoob, I., Hashem, I. A. T., Khan, I., Ahmed, A. I. A., Imran, M., & Vasilakos, A. V. (2017). The role of big data analytics in Internet of Things. Computer Networks, 129, 459-471. Akter, S. & Wamba, S., 2016. Big Data Analytics in E-commerce: a systematic review and agenda for future research. Electron Markets. [Online] Alpaydin, E. (2020).Introduction to machine learning. MIT press. Appelbaum, D., Kogan, A., Vasarhelyi, M., & Yan, Z. (2017). Impact of business analytics and enterprise systems on managerial accounting. International Journal of Accounting Information Systems, 25, 29-44. Ashrafi, A., Ravasan, A. Z., Trkman, P., & Afshari, S. (2019). The role of business analytics capabilities in bolstering firms’ agility and performance. International Journal of Information Management, 47, 1-15. Available at: https://doi.org/10.1007/s12525-016-0219-0 [Accessed 1 March 2020]. Available at: https://files.eric.ed.gov/fulltext/ED536788.pdf [Accessed 1 March 2020]. Aydiner, A. S., Tatoglu, E., Bayraktar, E., Zaim, S., & Delen, D. (2019). Business analytics and firm performance: The mediating role of business process performance. Journal of business research, 96, 228-237. Bichler, M., Heinzl, A., & van der Aalst, W. M. (2017). Business analytics and data science: once again? Blum, A., Hopcroft, J., & Kannan, R. (2020). Foundations of data science. Cambridge University Press. Cao, L. (2017). Data science: a comprehensive overview. ACM Computing Surveys (CSUR), 50(3), 1-42. Chambers, J. M. (2018). Graphical methods for data analysis. CRC Press.
36 Chiang, R. H., Grover, V., Liang, T. P., & Zhang, D. (2018). Strategic value of big data and business analytics. Dai, H. N., Wong, R. C. W., Wang, H., Zheng, Z., & Vasilakos, A. V. (2019). Big data analytics for large-scale wireless networks: Challenges and opportunities. ACM Computing Surveys (CSUR), 52(5), 1-36. Donoho, D. (2017). 50 years of data science. Journal of Computational and Graphical Statistics, 26(4), 745-766. Duan, Y., Cao, G., & Edwards, J. S. (2020). Understanding the impact of business analytics on innovation. European Journal of Operational Research, 281(3), 673-686. Eldén, L. (2019). Matrix methods in data mining and pattern recognition (Vol. 15). Siam. Gorunescu, F., 2011. Data Mining Concepts, Models and Techniques. s.l.: Springer Science & Business Media. Gupta, M., & George, J. F. (2016). Toward the development of a big data analytics capability. Information & Management, 53(8), 1049-1064. Han, J. & Kamber, M., 2011. Data Mining: Concepts and Techniques. San Francisco ed.s.l.: Morgan-Kaufmann Academic Press. Hand, D., 1998. Data Mining Statistics and More. s.l.: The American Statistician. Hassabis, D., Kumaran, D., Summerfield, C., & Botvinick, M. (2017). Neuroscience-inspired artificial intelligence. Neuron, 95(2), 245-258. Hofmann, M., & Klinkenberg, R. (Eds.). (2016). RapidMiner: Data mining use cases and business analytics applications. CRC Press. Jackson, P. C. (2019). Introduction to artificial intelligence. Courier Dover Publications. Kubick, W. 2012. Big Data, Information and Meaning In: Clinical Trial Insights. s.l.:s.n. Laursen, G. H., & Thorlund, J. (2016). Business analytics for managers: Taking business intelligence beyond reporting. John Wiley & Sons. Leskovec, J., Rajaraman, A., & Ullman, J. D. (2020). Mining of massive data sets. Cambridge university press.
37 Lowndes, J. S. S., Best, B. D., Scarborough, C., Afflerbach, J. C., Frazier, M. R., O’Hara, C. C., & Halpern, B. S. (2017). Our path to better science in less time using open data science tools. Nature ecology & evolution, 1(6), 1-7 Lu, H., Li, Y., Chen, M., Kim, H., & Serikawa, S. (2018). Brain intelligence: go beyond artificial intelligence. Mobile Networks and Applications, 23(2), 368-375. Marjani, M., Nasaruddin, F., Gani, A., Karim, A., Hashem, I. A. T., Siddiqa, A., & Yaqoob, I. (2017). Big IoT data analytics: architecture, opportunities, and open research challenges. IEEE Access, 5, 5247-5261. McAfee, A. & Brynjolfsson, E., 2012. Big Data The management revolution. s.l.: Havard Business Review. Menke, W. (2018). Geophysical data analysis: Discrete inverse theory. Academic press. Migrant & Seasonal Head Start Technical Assistant Center, n.d. Introduction to Data Analysis Handbook. [Online] Mikalef, P., Pappas, I. O., Krogstie, J., & Pavlou, P. A. (2019). Big data and business analytics: A research agenda for realizing business value. Mohri, M., Rostamizadeh, A., & Talwalkar, A. (2018).Foundations of machine learning. MIT press. Pappas, I. O., Mikalef, P., Giannakos, M. N., Krogstie, J., & Lekakos, G. (2018). Big data and business analytics ecosystems: paving the way towards digital transformation and sustainable societies. Peterson, R., 2018. 6 essential steps in data mining process. s.l.:s.n. Real, E., Shlens, J., Mazzocchi, S., Pan, X., & Vanhoucke, V. (2017). Youtube- boundingboxes: A large high-precision human-annotated data set for object detection in video. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition(pp. 5296-5305). Roiger, R. J. (2017). Data mining: a tutorial-based primer. CRC press. Russom , P., 2011. Big Data Analytics. s.l.:TDWI Best Practices Report.
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
38 Seddon, P. B., Constantinidis, D., Tamm, T., & Dod, H. (2017). How does business analytics contribute to business value? Information Systems Journal, 27(3), 237-269. Shearer, C., 2000. The Crisp-dm model: The new blueprint for data mining. s.l.:J Data Warehouse. Shmueli, G., Bruce, P. C., Yahav, I., Patel, N. R., & Lichtendahl Jr, K. C. (2017). Data mining for business analytics: concepts, techniques, and applications in R. John Wiley & Sons. Steels, L., & Brooks, R. (Eds.). (2018). The artificial life route to artificial intelligence: Building embodied, situated agents. Routledge. Tan, P. N., Steinbach, M., & Kumar, V. (2016). Introduction to data mining. Pearson Education India. Tegmark, M. (2017). Life 3.0: Being human in the age of artificial intelligence. Knopf. Van Der Aalst, W. (2016). Data science in action. In Process mining (pp. 3-23). Springer, Berlin, Heidelberg. Varambally, K. V. M. (2020). “Sustainability and Amazon”–A Case Study on Amazon Company.Our Heritage,68(1), 12094-12100. Vershynin, R. (2018). High-dimensional probability: An introduction with applications in data science (Vol. 47). Cambridge university press. Vidgen, R., Shaw, S., & Grant, D. B. (2017). Management challenges in creating value from business analytics. European Journal of Operational Research, 261(2), 626-639. Wamba, S. F., Gunasekaran, A., Akter, S., Ren, S. J. F., Dubey, R., & Childe, S. J. (2017). Big data analytics and firm performance: Effects of dynamic capabilities. Journal of Business Research, 70, 356-365. Yuan, Y. (2018).Alibaba Group: Development and Influence(No. 2018-26-13). Zhou, J. (2017). Big data analytics and intelligence at Alibaba cloud. InProceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems.