SHU AI Coursework: Bacteria Data Classification with Neural Networks

Verified

Added on 2023/06/13

AI Summary

This report details the classification of bacteria data using neural networks implemented in MATLAB r2017a. The study focuses on five classes of bacteria and utilizes the classification learner application within MATLAB to train and test the data. The report covers the introduction to neural networks, their application in data classification, requirements analysis, design considerations, implementation and testing procedures, and a thorough evaluation of the results. The back-propagation algorithm is employed for training, and the performance of the classifier is assessed using metrics such as error histograms, performance scales, confusion matrices, and ROC curves. The study highlights the potential of neural networks in medical diagnosis and other applications requiring pattern recognition and decision-making based on complex datasets. Desklib provides access to a wealth of similar solved assignments and study resources for students.

SHEFFIELD HALLAM UNIVERSITY
FACULTY OR DEPARTMENT
55-700241 APPLICABLE ARTIFICIAL INTELLIGENCE
FIRST SIT COURSEWORK
DATA CLASSIFICATION USING NEURAL NETWORKS
ACADEMIC SESSION 2017-2018
STUDENT NAME
STUDENT REGISTRATION NUMBER
DATE OF SUBMISSION

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

ABSTRACT
This paper seeks to perform classification of bacteria data obtained from the lab experiment
carried out. The analysis and research experiment is performed using the classification learner
application of the MATLAB r2017a software. Neural networks are a key branch in artificial
intelligence (Helena, et al., 2003). The system teaches that the data collected can be used to
execute tasks instead of programming computational systems to perform definite tasks. The
training using different inputs improves the performance of the classifier ensuring that the data
set on the bacteria electrochemical reaction is as close as possible to the set targets. The results
and analysis or evaluation section captures the results on the errors encountered plotted in
histogram, performance scales, confusion figures and ROC.
1

TABLE OF CONTENTS
ABSTRACT....................................................................................................................................1
INTRODUCTION...........................................................................................................................3
REQUIREMENTS ANALYSIS......................................................................................................7
DESIGN CONSIDERATIONS.......................................................................................................8
IMPLEMENTATION AND TESTING..........................................................................................9
EVALUATION.............................................................................................................................19
CONCLUSION..............................................................................................................................22
BIBLIOGRAPHY..........................................................................................................................23
APPENDIX....................................................................................................................................26
2

INTRODUCTION
There are five classes of bacteria under study in this research paper. An artificial neural
network is made up of many artificial neurons which are correlated together in accordance with
explicit network architecture. The objective of the neural network is to convert the inputs into
significant outputs. The teaching mode can be supervised or unsupervised. The Escherichia coli
is a gram-negative rod from the family Enterobacteriaceae. Most of these bacteria are found in
the intestinal tract. The EHEC is a subset of pathogenic E. coli that can cause diarrhea or
hemorrhagic colitis in humans. The Hemorrhagic colitis occasionally progresses to hemolytic
uremic syndrome which is an important cause of acute renal failure in children and the morbidity
and mortality in adults. The pathogenic strains of the organism are distinguished from the normal
flora by their possession of the virulence factors such as exotoxins.
Figure 1 Colorized scanning electron micrograph depicting the Escherichia coli: CDC Public
Health Image Library
The bacteria data was obtained from a medical chart and is represented using
multidimensional datasets. The classification of the dataset as well as its clustering is required
and is significant in the study and analysis of data (Cacoullos).
3

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Data Classification in Neural Networks
Neural networks are a key branch in artificial intelligence (Helena, et al., 2003). The system
teaches that the data collected can be used to execute tasks instead of programming
computational systems to perform definite tasks. For the data provided on the five classes of
bacteria, using the ANN, one can quickly establish a pattern in the data that may be useful in
decision making or to make a conclusion about a given behavior based on the phenomenon
(R.Furferi, 2011). The models are pragmatic and they are useful in the field of medical diagnosis
with relationships draw from very dissimilar data using the artificial intelligence techniques.
Some of the key applications of the neural networks are in the detection of faults in systems,
product inspection, speech recognition and in financial systems to determine a bankruptcy (Yu-
guo & Hua-peng, 2010).
4

There are a number of algorithms that can be employed in training and testing the data
that needs to be classified such as the back propagation neural network and the multilayer
feedforward network (Hajmeer M. B., 2006). A feedforward multiple layer neural network in
this case will classify the data as set in the separate classes from the 37 instances of the data tests.
The technique is limited to a given range of performance, cost-benefit analysis, and
implementation (Saravanan & Sasithra, 2014). The major disadvantage in using ANN is to find
the most appropriate grouping of training, learning and transfer function for classifying the data
sets with growing number of features and classified sets. The most preferred method for ANN
dataset classification is the back-propagation algorithm (Zhang, 2000). It has the best
combination of training, learning and transfer function for the classification of dataset.
Unfortunately, the combination does not support very large data set. The data set in this paper
only covers 37 samples which are tested over different scales in the electrochemical reaction.
The back-propagation algorithm (Priyadarshini, 2010) was developed by Rojalina Priyadarshini.
The BPNN is considered to have a highly predictive ability with stable and well-functioning
constructs that are useful in the classification of the data.
5

Figure 2 The architecture of the Back-propagation ANN Model
The ANN utilizes a number of parameters to distinguish corrosion types jointly instead of
using one parameter independently. One need not establish the evaluation mathematical model of
the various parameters as the tool efficiently works on that. The ANN performs machine learning
which can substitute for the human brain to complete recognition (Geeraerd, 2008). The system
is easy to apply in the real-time systems as it works well in the high processing speeds. The
system will learn from the experimental of data set values that will be used to determine the
reaction status of the bacteria data. The SVM is a new kind of learning machine that uses the
central concept named kernel for a number of learning tasks (Hajmeer M. B., 2000). Kernel
machines provide a modular framework that can be adapted to different tasks and domains by
using different kernel function. SVMs have good performance in solving classification and
regression problems. Artificial neural networks (ANNs) were successfully applied to data
observations from a small watershed consisting of commonly measured indicator bacteria,
weather conditions, and turbidity to distinguish between human sewage and animal-impacted
6

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

runoff, fresh runoff from aged, and agricultural land-use-associated fresh runoff from that of
suburban land-use-associated-fresh runoff.
Figure 3 SVM Architecture
The procedure adopted in the ANN is:
(i) It learns from the training data set values and recognizes different patterns in the
electrochemical reaction as represented by the bacteria data. The series of input
feature vectors containing many parameters and output different reactions without
having any assumptions about their nature and interrelations (Jeyamkondan, 2001).
REQUIREMENTS ANALYSIS
Functional requirements
(i) To design, implement and evaluate a neural network for data classification
(ii) To analyze different parameters on the bacteria data as developed from the neural
network.
7

Non-functional requirements
(i) The neural network model used must be implemented according to the given
specifications and one may invoke a different script as a function.
(ii) The Artificial neural network needs to have a robust error handling scheme to ensure
that the data is well analyzed and any error encountered are handled in the system.
(iii) It is important to note that the training data is usually two-thirds the entire data set
and the remaining third is meant for testing. The network is simulated using the same
data. In this experiment, the back-propagation algorithm is used to train the neural
network.
Expected performance
The data collected should meet the target pattern for proper classification of the data. The
probability analysis seeks to respond to the electrochemical reaction involving bacteria. The data
should fit in the given classification based on correct or incorrect target. The aim is to hit as
many correct targets as possible for an improved performance. Hence the targets are represented
as 0 or 1 to represent correct and incorrect targets respectively.
DESIGN CONSIDERATIONS
The data used in the testing and training of the artificial neural network is available in 37
instances. There are 5 classes of bacteria that are used in the analysis namely:
(i) Escherichia coli
(ii) E-coli
8

(iii) Staphylococcus aureus
(iv) Staphaureus
(v) Serratia marcescens
The 37 instances of data collected on the 5 classes was used as the initial training set in the
neural network system. The neural network was used to analyze the bacteria data using the
MATLAB/Simulink r2017a. the ANN model for each of the 5 classes contains an input layer, a
hidden layer and one output layer. The ANN architecture is developed by organizing nodes into
layers. These layers are later linked to each other with modifiable weighted interconnections. For
this study there are 5 inputs and 2 expected outputs with a number of hidden layers. The fully
connected topology shows that there are 5 inputs nodes with 25 hidden weights and 2 output
nodes. An extra node is added to depict bias.
IMPLEMENTATION AND TESTING
Coding decisions
This paper takes a great importance in ensuring that the experiment is using the correct
exemplars for training and testing. There are 5 sets of data depicting different bacteria classes.
The samples taken in this experiment fill the sample space as required by the researcher. The
neural network boasts of great interpolation as opposed to the extrapolation. The networks need
to be trained with samples that are equally-spaced and for the required range of data set values.
The first five samples of data as listed below are used to train the network.
Sa0907- Staphylococcus aureus
Sa1704 - Staphaureus
Ec1104 – Escherichia coli
9

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Ec1404 – E. coli
Sm131 - Serratia marcescens
Figure 4 Algorithm used in bacteria classification technique
The data collected was based on the actual data from the bacteria population. The data used to
train the network is collected and the concentrations of the determinant is ascertained from the
electrochemical process. The data set is used to train and test to find out the outcome of the
electrochemical process.
(i) Choose an algorithm to use for training the Artificial Neural Network
(ii) The connection weights are set to small random values which include the weights
connecting the bias to the hidden and output layers.
10

(iii) The training is applied to the network and it is allowed to run until an output is
produced at each output node. There is a difference between the actual output and the
expected output and the difference is fed back through the network in the reverse
direction while using the back-propagation algorithm to determine the signal flow.
(iv) The back-propagation algorithm performs gradient of the surface at its location. The
Gradient Descent Method, TRAINLM, LEARNGDM AND LOGSIG are used as
command in performing the training, learning, and testing activities in the neural
network. The gradient descent method, for instance, is used to ensure that there is a
decreased mean squared error between the network output and the actual error rate
(S.Prabakar, K.Porkumaran, & Isaac, 2010).
The data used in the experiment is based on the bacteria reactions to an electrochemical process
and they are organized into classes per file. The files used such as ec1104, sm1310 etc. tend to
represent full experimental data for bacteria of that class such that there are about 37 measures of
the same for the 5 classes. Accurate and fast bacteria classification is extremely important as it
may help to perform early diagnosis of severe diseases thus making it possible to be controlled
(Ying, Zhiye, & Jianping, 2011). consequently, it is very important to eliminate this manual
procedure and introduce a new technique on bacteria classification using pattern recognition in
order to achieve more accurate result and to make it faster, and more efficient. Existing
techniques in the literature have do not produce satisfactory results in bacteria identification
(O.Richard, Peter, & David, 2000).
Test cases
11