Analyzing Hybrid Machine Learning Methods: ANN, SVM, and DT

Verified

Added on 2023/06/04

AI Summary

This report presents a machine learning hybrid method analysis using Artificial Neural Networks (ANN), Support Vector Machines (SVM), and Decision Trees (DT). The analysis utilizes a dataset of 699 rows, splitting it into 80% for training and 20% for testing. The base technique, ANN, achieves an accuracy of approximately 80%, while SVM reaches 85%. The hybrid approach, combining these methods, aims to enhance accuracy, focusing on identifying true positives and false negatives. The DT method completes the performance, achieving a 98.08% accuracy level. The process involves data training, classification, and performance evaluation using plots and confusion matrices to optimize the system's fitness and minimize false positives. The report includes code snippets demonstrating data loading, segmentation, and model training in MATLAB, along with references to relevant research papers.

MACHINE LEARNING
HYBRID METHOD ANALYSIS USING ANN, SVM AND DT
STUDENT ID NUMBER
STUDENT NAME

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

INTRODUCTION
Data analysis in machine learning goes through the training phase and the remaining part of
the data is tested. In this modeling case, the data provided consists of 699 rows where 80 percent of
the data is used in training and 20 percent is used in testing the developed model. The common
techniques employed in the data analysis are Artificial Neural Network, Support Vector Machine
and the DC are used. The neural network obtains an efficiency or accuracy level of up to 80 percent
while the support vector machine obtains 85 percent. Both methods are used to develop a higher
level of accuracy for the proect. The aim of this proect is to identify the true positive and the false
negatives of the data. The ANN is the base technique and the SVM builds upon it and the DT
method completes the performance giving an accuracy level of 98.08 percent. These systems are
linear classifiers that are used in building the hyper-based classifiers from datasets to illustrate the
performance in a particular application. The SVM has some linear weighted sum of the explanatory
variables which tends to be lower or higher compared to the set threshold. There are a number of
rules set during the ANN network training process which converts the opaque models into
transparent models when the rules are extracted.
MODELLING AND ANALYSIS
The first step involves data training. The matlab script imports data into the software and the system
develops the 80 percent of the data as the training data. The 20 percent is set aside for testing. The
code snippet below shows that loading and data segmentation,
%% Import the data
[~, ~, raw] = xlsread('/home/pina/Documents/order809794/data01.xlsx','Sheet1');

raw = raw(2:end,:);
raw(cellfun(@(x) ~isempty(x) && isnumeric(x) && isnan(x),raw)) = {''};
stringVectors = string(raw(:,11));
stringVectors(ismissing(stringVectors)) = '';
raw = raw(:,[1,2,3,4,5,6,7,8,9,10]);
%% Exclude rows with non-numeric cells
I = ~all(cellfun(@(x) (isnumeric(x) || islogical(x)) && ~isnan(x),raw),2); % Find rows with non-
numeric cells
raw(I,:) = [];
stringVectors(I,:) = [];
%% Create output variable
data = reshape([raw{:}],size(raw));
%% Allocate imported array to column variable names
ID = data(:,1);
CLUMPTHICKNESS = data(:,2);
UNIFORMITYOFCELLSIZE = data(:,3);
UNIFORMITYOFCELLSHAPE = data(:,4);
MARGINALADHESION = data(:,5);
SINGLEEPITHELIALCELLSIZE = data(:,6);
BARENUCLEI = data(:,7);
BLANDCHROMATIN = data(:,8);
NORMALNUCLEOLI = data(:,9);
MITOSES = data(:,10);
CLASS = categorical(stringVectors(:,1));
%% Clear temporary variables
clearvars data raw stringVectors I;
%% disease information
% disease=[BARENUCLEI BLANDCHROMATIN CLUMPTHICKNESS MARGINALADHESION MITOSES
NORMALNUCLEOLI SINGLEEPITHELIALCELLSIZE UNIFORMITYOFCELLSHAPE
UNIFORMITYOFCELLSIZE]
% distrain=data(1:560)'
% distest=disease(561:700)'
R1a=BARENUCLEI(1:547);
R2a=BLANDCHROMATIN(1:547);
R3a=CLUMPTHICKNESS(1:547);
R4a=MARGINALADHESION(1:547);
R5a=MITOSES(1:547);
R6a=NORMALNUCLEOLI(1:547);
R7a=SINGLEEPITHELIALCELLSIZE(1:547);
R8a=UNIFORMITYOFCELLSHAPE(1:547);
R9a=UNIFORMITYOFCELLSIZE(1:547);
R1b=BARENUCLEI(548:683);
R2b=BLANDCHROMATIN(548:683);
R3b=CLUMPTHICKNESS(548:683);
R4b=MARGINALADHESION(548:683);
R5b=MITOSES(548:683);
R6b=NORMALNUCLEOLI(548:683);
R7b=SINGLEEPITHELIALCELLSIZE(548:683);
R8b=UNIFORMITYOFCELLSHAPE(548:683);
R9b=UNIFORMITYOFCELLSIZE(548:683);
datatrain=[R1a R2a R3a R4a R5a R6a R7a R8a R9a]; % 80 percent of the total data
datatest=[R1b R2b R3b R4b R5b R6b R7b R8b R9b]; % 20 percent of the total data

The next stage involves data classification by developing a neural network and displacing different
plots.
%Classification
%% creating a network
net = patternnet(5);
view (net)
%% Training the dataset
[y_net tr1]= train(net,datatrain,datatest)
%% A performance plot is used to check the training, validation and testing phases
figure(1)
plotperform(tr1)
grid on
% The performance is considered good when there are fewer false positives
% required to get a high true positive rate
figure(2)
plottrainstate(tr1)
grid on
Adding the SVM to improve the system performance from the base mdoel of the ANN,
%% To train the system further using the support vector machines model
mdl=fitcsvm(dataTrain,dataTest)
The next stage involves performing a fitness of the system: -
(i)The fittest individuals in the population represent solutions good enough so that the problem
could be solved.
(ii)The population has converged. A gene has converged when 95% of the population has the same
value for that gene. Once all the genes reach convergence it is said that the population has
converged. When this phenomenon happens, the average goodness of the population is close to the
goodness of the fittest individual.
(iii)The difference of the best solutions found between different generations is reduced. This may
indicate, at the very best, that the population has reached an overall solution or on the contrary that
the population has come to a standstill at a local minimum value.
(iv)A predetermined maximum number of generations have been reached.

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

The results of the table showing true positive and false negative is developed using the confusion
plot and roc such that,
References
Hindawi (2013). Hybrid model based on genetic algorithms and svm applied to variable selection
within Fruit Punch classification https://www.hindawi.com/journals/tswj/2013/982438/

Hal, il. Bisgin, et. Al (2018). Comparing the SVM and ANN based machine learning methods for
species identification of food contaminating beetles https://www.nature.com/articles/s41598-018-
24926-7

1 out of 6

Your All-in-One AI-Powered Toolkit for Academic Success.

+13062052269

info@desklib.com

Available 24*7 on WhatsApp / Email

Company

Tools

Support