CST4340: Data Management for Decision Support - Analysis Report

Verified

Added on  2022/09/07

|21
|3110
|16
Report
AI Summary
This report presents a data analysis project utilizing data mining techniques to analyze the trading history of British American Tobacco PLC (BATS) over a five-year period. The project employs linear regression and artificial neural networks (ANN) with the Weka data mining tool to explain and predict the data. The report details the dataset description, including dependent and independent variables. It explores the analytic models, explaining the objectives, terminologies, strengths, and weaknesses of both linear regression and ANN. The methodology includes data randomization, splitting, and training for both models. The results of multiple linear regression and ANN are presented, along with observations of ANN training results. The evaluation includes correlation coefficient, root mean squared error (RMSE), and mean absolute error (MAE). The report concludes with a decision support analysis and provides recommendations for improving the company’s trading view. The assignment fulfills the requirements of the CST4340 coursework, focusing on applying data-mining techniques to a real dataset and evaluating the results to provide effective decision support.
Document Page
DATA MANAGEMENT FOR DECISION SUPPORT
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Table of Contents
1. Introduction.......................................................................................................................1
2. Introduction of Dataset, Problems, and Methods..........................................................1
2.1 Dataset Description.......................................................................................................1
2.2 Description of Dataset Variables.................................................................................2
2.2.1 Dependent and Independent Variables................................................................2
2.3 Case Description............................................................................................................2
2.4 Software Requirement..................................................................................................3
3. Analytic Models.................................................................................................................3
3.1 Linear Regression..........................................................................................................3
3.1.1 Objectives Linear Regression...............................................................................3
3.1.2 Multiple Linear Regression Terminologies.........................................................3
3.1.3 Strength and Weaknesses of Linear Regression.................................................4
3.2 Artificial Neural Networks (ANN)...............................................................................4
3.2.1 Learning Process....................................................................................................4
3.2.2 Objectives of ANN.................................................................................................4
3.2.3 Concept of ANN.....................................................................................................5
3.2.4 Feed Forward Neural Network Description of Components.............................5
3.2.5 Description of Components: FFNN......................................................................5
3.2.6 Strengths and Weaknesses of ANN......................................................................6
4. Data Analysis and Results................................................................................................6
4.1 Data Randomization.....................................................................................................6
4.2 Splitting and Training...................................................................................................9
4.3 Multiple Linear Regression with Weka (MLR).......................................................10
4.4 Artificial Neural Network with Weka (ANN)...........................................................12
4.5 Observation of ANN Training Results......................................................................14
5. Evaluation........................................................................................................................16
5.1 Correlation Coefficient...............................................................................................16
5.2 Root Mean Squared Error (RMSE)..........................................................................16
5.3 Mean Absolute Error (MAE).....................................................................................16
5.4 Summary Result for both MLR and ANN................................................................16
6. Decision Support.............................................................................................................17
7. Conclusion.......................................................................................................................17
References...............................................................................................................................18
Document Page
1. Introduction
Artificial neural network is commonly known as a computational model which works
and looks like the biological neural network's structure and functions (What is Artificial
Neural Network - Structure, Working, Applications, 2018). The neural networks could be
employed as a data analysis tools to forecast and predict depending on the historical data in a
data-driven Decision Support System. On the other hand, the neural networks could also be
viewed as the quantitative models for being utilized in the model-driven Decision Support
System.
The main objective of this project is to apply the data mining techniques to analyze
the real dataset and evaluate the results. This project uses BATS BRITISH AMERICAN
TOBACCO PLC ORD 25P dataset, and also takes help of linear regression and artificial
neural network method for evaluating the results with the assistance of Weka data mining
tool. Here, data analysis is used to explain and predict the selected real data and possible
phenomena behind it. Finally, it will analyze and summarize the results to provide effective
decision support and conclusion.
2. Introduction of Dataset, Problems, and Methods
2.1 Dataset Description
This project uses British American Tobacco PLC, a holding company dataset, where
the last five years data is selected (British American Tobacco p.l.c. (BATS.L), 2020). The
dataset is represented in the below figure.
1
Document Page
2.2 Description of Dataset Variables
The provided dataset contains the following variables:
Date
Open
High
Low
Close
Adj. close
Volume
2.2.1 Dependent and Independent Variables
This dataset contains the following dependent and independent variables:
Dependent variable: Volume
Independent variables: Date, Open, High. Low, close, and Adj. Close.
2.3 Case Description
The British American Tobacco PLC company is a multi-category consumer good
company and it provides nicotine and tobacco products. This project uses this company data
to analyze its last five years data, with the help of linear regression and artificial neural
network method. The results are evaluated by using Weka data mining tool (Han, Kamber
and Pei, 2012). This tool explains and predicts the company’s data and helps in analyzing and
2
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
summarizing the results to take effective decision support for improving the company’s
trading view.
2.4 Software Requirement
This project makes use of Weka tool, which is a data mining tool for analyzing the
selected data. Weka tools refers to a collection of machine learning algorithms used to
resolve the real-world data mining issues (Witten, Frank, Hall and Pal, 2017). This tool is
programmed in Java and executes on any platform. The algorithms could either be directly
applied to the dataset or it can be called using your Java code. It facilitates various tools for
data pre-processing, to implement various Machine Learning algorithms, and gives access of
visualization tools for developing machine learning techniques, which are applied on the real-
world data mining issues (Stahlbock, Abou-Nasr and Weiss, 2018).
3. Analytic Models
3.1 Linear Regression
3.1.1 Objectives Linear Regression
Linear regression models are used for prediction purpose, where the regression
models are used for the inference and prediction purpose. The predictive goal ensures to
evaluate the model’s performance on a validation set and for using the predictive metrics
(Xanthopoulos, Pardalos and Trafalis, 2013).
3.1.2 Multiple Linear Regression Terminologies
Multiple linear regression (MLR) is even called as a multiple regression, which is a
statistical technique utilizing various explanatory variables for predicting the response
variable’s outcome. MLR’s goal includes modeling linear relationship between the
explanatory (independent) variables and response (dependent) variable (Modern Machine
Learning Algorithms: Strengths and Weaknesses, 2019).
A simple linear regression denotes a function which permits an analyst or statistician
for making the predictions about one variable depending on the information known about the
other variables (Hastie, Friedman and Tisbshirani, 2017). Linear regression could just be
utilized when one contains two continuous variables—an independent variable and a
dependent variable. The independent variable refers to a parameter which is utilized for
calculating the outcome or the dependent variable.
3
Document Page
3.1.3 Strength and Weaknesses of Linear Regression
Strengths:
It could be regularized for avoiding over fitting.
The linear models could be easily updated with new data with the help of
stochastic gradient descent.
Weaknesses:
Linear regression has poor performance, when there are non-linear relationships.
Linear regression lacks natural flexibility for capturing highly complicated patterns.
It is time consuming and tricky to add the right interaction terms or polynomials.
3.2 Artificial Neural Networks (ANN)
3.2.1 Learning Process
Learning rule or learning process of an Artificial neural network refers to a method,
mathematical logic or algorithm that improvises the performance and training time of
network. In general, this rule is frequently implemented on a network, by updating
the network levels of weights and bias when a network is simulated in a particular data
environment.
Learning rule might accept the present network conditions i.e., weights and biases, and
compares the expected results with the actual network results for providing new and
improved values for weights and bias (Introduction to Learning Rules in Neural Network,
2018). The learning rule addresses the factors which helps to decide how fast or accurately
the artificial network could be developed. For developing a network, it requires the following
three main machine learning models (Perner, 2015):
Unsupervised learning
Supervised learning
Reinforcement learning
3.2.2 Objectives of ANN
Artificial Neural Networks is abbreviated as ANN, and it is a computational model.
The following are the objectives of ANN:
1) This computational model is developed using the biological neural network’s
structures and functions, for better functioning. Though, the ANN’s structure
4
Document Page
depends on the information flow, but the neural network’s changes depend on
the input and output.
2) It is possible to assume ANN to be a nonlinear statistical data, which refers to
a complex relationship defined between the input and output. Thus, various
different patterns can be found.
3.2.3 Concept of ANN
Neural networks can be referred as the parallel computing devices that tries to
develop a brain like functioning computer model. The major objectives of ANN includes
developing a system for performing different computational tasks, which are faster when
compared to the traditional systems. Such tasks comprises of data clustering, approximation,
pattern recognition, pattern classification, and optimization.
3.2.4 Feed Forward Neural Network Description of Components
In this network, the information flow is considered to be unidirectional. A unit is just
utilized for sending the information to other unit which don’t actually receive any kind of
information. On the other hand, it doesn’t contain any feedback loops. But, it is utilized for
pattern recognition, and has fixed inputs and outputs.
3.2.5 Description of Components: FFNN
Feedforward neural network (FFNN) refers to a machine learning classification
algorithm, which comprises of organized layers that looks just like the human neuron
processing units. In FFNN, each unit of a layer is linked with the rest of its units. These
layer’s connections with the units are not equal, due to varying weight or strength of each
connection. The network connection’s weight measures the potential amount of network
5
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
knowledge. Moreover, the NN units are even referred as the nodes. In a network, information
processing contains data entry from the input units and passes via network, and flows from
one layer to the other layer till it reaches the output units.
3.2.6 Strengths and Weaknesses of ANN
The strengths of ANN are listed below (M. Mijwil, 2018):
It has the capacity of storing information of complete network.
It is capable of working with incomplete knowledge/information.
It has fault tolerance.
It contains a distributed memory.
It ensures gradual corruption.
It has the capacity of making machine learning.
Also, it has the capability of parallel processing.
The weaknesses of ANN are listed below (Artificial Neural Networks Advantages and
Disadvantages, 2020):
It has hardware dependency.
It also experiences unexplained network behaviour.
It can determine a proper network structure.
It faces issues to show the network problem.
The network duration is not known.
4. Data Analysis and Results
4.1 Data Randomization
For data analysis, firstly carry out data randomization. To perform data randomization
utilize the following steps:
First, open the data on Weka as demonstrated in the below screenshot (Kaluza, 2013).
6
Document Page
Afterwards view the pre- processing tab, click on Filter Choose Unsupervised
Instance Randomize. It is demonstrated in the following screenshot.
Click on Apply to apply randomize on the data as presented below.
7
Document Page
Next, convert all the strings to nominal for all the string attributes by clicking on
Filter Choose Unsupervised Attributes String to Nominal. This process is
demonstrated in the below screenshot.
8
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
4.2 Splitting and Training
For Training set,
Click on Filter Choose Unsupervised Instance Remove percentage. Here,
enter remove percentage as 60.0. This step is demonstrated in the following screenshot.
For Testing set,
9
Document Page
Click on Filter Choose unsupervised instance remove percentage. Invert
selection as true and then save the file as Testing Data set.
Open the training data set, perform linear regression and artificial neural network.
4.3 Multiple Linear Regression with Weka (MLR)
To perform multiple linear regression on Weka click on Classify Functions
Linear regression as shown below (Mavroforakis, 2011).
10
chevron_up_icon
1 out of 21
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]