Hybrid Method Analysis using ANN, SVM and DT for Machine Learning
Verified
Added on 2023/06/04
|6
|1141
|407
AI Summary
This article discusses the Hybrid Method Analysis using ANN, SVM and DT for Machine Learning. It covers the process of data training, classification, and fitness of the system. The article also provides insights into the true positive and false negative results. References are also provided for further reading.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
MACHINE LEARNING HYBRID METHOD ANALYSIS USING ANN, SVM AND DT STUDENT ID NUMBER STUDENT NAME
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
INTRODUCTION Data analysis in machine learning goes through the training phase and the remaining part of the data is tested. In this modeling case, the data provided consists of 699 rows where 80 percent of the data is used in training and 20 percent is used in testing the developed model. The common techniques employed in the data analysis are Artificial Neural Network, Support Vector Machine and the DC are used. The neural network obtains an efficiency or accuracy level of up to 80 percent while the support vector machine obtains 85 percent. Both methods are used to develop a higher level of accuracy for the proect. The aim of this proect is to identify the true positive and the false negatives of the data. The ANN is the base technique and the SVM builds upon it and the DT method completes the performance giving an accuracy level of 98.08 percent. These systems are linear classifiers that are used in building the hyper-based classifiers from datasets to illustrate the performance in a particular application. The SVM has some linear weighted sum of the explanatory variables which tends to be lower or higher compared to the set threshold. There are a number of rules set during the ANN network training process which converts the opaque models into transparent models when the rules are extracted. MODELLING AND ANALYSIS The first step involves data training. The matlab script imports data into the software and the system develops the 80 percent of the data as the training data. The 20 percent is set aside for testing. The code snippet below shows that loading and data segmentation, %% Import the data [~, ~, raw] = xlsread('/home/pina/Documents/order809794/data01.xlsx','Sheet1');
raw = raw(2:end,:); raw(cellfun(@(x) ~isempty(x) && isnumeric(x) && isnan(x),raw)) = {''}; stringVectors = string(raw(:,11)); stringVectors(ismissing(stringVectors)) =''; raw = raw(:,[1,2,3,4,5,6,7,8,9,10]); %% Exclude rows with non-numeric cells I = ~all(cellfun(@(x) (isnumeric(x) || islogical(x)) && ~isnan(x),raw),2);% Find rows with non- numeric cells raw(I,:) = []; stringVectors(I,:) = []; %% Create output variable data = reshape([raw{:}],size(raw)); %% Allocate imported array to column variable names ID = data(:,1); CLUMPTHICKNESS = data(:,2); UNIFORMITYOFCELLSIZE = data(:,3); UNIFORMITYOFCELLSHAPE = data(:,4); MARGINALADHESION = data(:,5); SINGLEEPITHELIALCELLSIZE = data(:,6); BARENUCLEI = data(:,7); BLANDCHROMATIN = data(:,8); NORMALNUCLEOLI = data(:,9); MITOSES = data(:,10); CLASS = categorical(stringVectors(:,1)); %% Clear temporary variables clearvarsdatarawstringVectorsI; %% disease information % disease=[BARENUCLEI BLANDCHROMATIN CLUMPTHICKNESS MARGINALADHESION MITOSES NORMALNUCLEOLI SINGLEEPITHELIALCELLSIZE UNIFORMITYOFCELLSHAPE UNIFORMITYOFCELLSIZE] % distrain=data(1:560)' % distest=disease(561:700)' R1a=BARENUCLEI(1:547); R2a=BLANDCHROMATIN(1:547); R3a=CLUMPTHICKNESS(1:547); R4a=MARGINALADHESION(1:547); R5a=MITOSES(1:547); R6a=NORMALNUCLEOLI(1:547); R7a=SINGLEEPITHELIALCELLSIZE(1:547); R8a=UNIFORMITYOFCELLSHAPE(1:547); R9a=UNIFORMITYOFCELLSIZE(1:547); R1b=BARENUCLEI(548:683); R2b=BLANDCHROMATIN(548:683); R3b=CLUMPTHICKNESS(548:683); R4b=MARGINALADHESION(548:683); R5b=MITOSES(548:683); R6b=NORMALNUCLEOLI(548:683); R7b=SINGLEEPITHELIALCELLSIZE(548:683); R8b=UNIFORMITYOFCELLSHAPE(548:683); R9b=UNIFORMITYOFCELLSIZE(548:683); datatrain=[R1a R2a R3a R4a R5a R6a R7a R8a R9a];% 80 percent of the total data datatest=[R1b R2b R3b R4b R5b R6b R7b R8b R9b];% 20 percent of the total data
The next stage involves data classification by developing a neural network and displacing different plots. %Classification %% creating a network net = patternnet(5); view (net) %% Training the dataset [y_net tr1]= train(net,datatrain,datatest) %% A performance plot is used to check the training, validation and testing phases figure(1) plotperform(tr1) gridon % The performance is considered good when there are fewer false positives % required to get a high true positive rate figure(2) plottrainstate(tr1) gridon Adding the SVM to improve the system performance from the base mdoel of the ANN, %% To train the system further using the support vector machines model mdl=fitcsvm(dataTrain,dataTest) The next stage involves performing a fitness of the system: - (i)The fittest individuals in the population represent solutions good enough so that the problem could be solved. (ii)The population has converged. A gene has converged when 95% of the population has the same value for that gene. Once all the genes reach convergence it is said that the population has converged. When this phenomenon happens, the average goodness of the population is close to the goodness of the fittest individual. (iii)The difference of the best solutions found between different generations is reduced. This may indicate, at the very best, that the population has reached an overall solution or on the contrary that the population has come to a standstill at a local minimum value. (iv)A predetermined maximum number of generations have been reached.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
The results of the table showing true positive and false negative is developed using the confusion plot and roc such that, References Hindawi (2013). Hybrid model based on genetic algorithms and svm applied to variable selection within Fruit Punch classificationhttps://www.hindawi.com/journals/tswj/2013/982438/
Hal, il. Bisgin, et. Al (2018). Comparing the SVM and ANN based machine learning methods for species identification of food contaminating beetleshttps://www.nature.com/articles/s41598-018- 24926-7