K-Nearest Neighbors Algorithm for Iris Data Classification

Verified

Added on 2023/04/06

AI Summary

This code implements the K-Nearest Neighbors algorithm for classifying Iris data. It loads the iris.mat file, randomizes the data, divides it into training and testing sets, computes the Euclidean distance for each observation in the test data, evaluates the k nearest neighbors, applies the label with the minimum distance, and returns the class label. It also computes the confusion matrix.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.

% 1: Load iris.mat file which contains Iris data and its label
% seperately.
% 2: Randomize the order of data for each iternation so that new sets of
% training and test data are formed.
%
% The training data is of having size of Nxd where N is the number of
% measurements and d is the number of variables of the training data.
%
% Similarly the size of the test data is Mxd where M is the number of
% measurements and d is the number of variables of the test data.
% 3: For each observation in test data, we compute the euclidean distance
% from each obeservation in training data.
% 4: We evalutate 'k' nearest neighbours among them and store it in an
% array.
% 5: We apply the label for which distance is minimum
% 5.1: In case of a tie, we randomly label the class.
% 6: Return the class label.
% 7: Compute confusion matrix.
Clear all;
Clc;
% step 1
Load iris.mat;
%step 2: Randomizing and dividing data into 0.8:0.2 ratio for training and
testing.
split=0;
count=0;
p=0.80;
while(count~=1)
numofobs=length(irisdata);
rearrangement= randperm(numofobs);
newirisdata=irisdata(rearrangement,:);
newirislabel=irislabel(rearrangement);
split = ceil(numofobs/2);
count=count+1
end
iristrainingdata = newirisdata(1:split,:);
iristraininglabel = newirislabel(1:split);
iristestdata = newirisdata(split+1:end,:);
originallabel = newirislabel(split+1:end);
if p<=0.8
numoftrainingdata = size(iristrainingdata,1);
else
numoftestdata = size(iristestdata,1);
end
for sample=1:numoftestdata
function [predicted_labels,nn_index,accuracy] =
KNN_(k,data,labels,t_data,t_labels)
%KNN_: classifying using k-nearest neighbors algorithm. The nearest neighbors
%search method is euclidean distance

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

%Usage:
% [predicted_labels,nn_index,accuracy] =
KNN_(3,numoftrainingdata,numoftrainingdata_labels,numoftestdata,numoftestdata_lab
els)
% predicted_labels = KNN_(3,
numoftrainingdata,numoftraining_labels,numoftesting)
%Input:
% - k: number of nearest neighbors
% - data: (NxD) training data; N is the number of samples and D is the
% dimensionality of each data point
% - labels: training labels
% - t_data: (MxD) testing data; M is the number of data points and D
% is the dimensionality of each data point
% - t_labels: testing labels (default = [])
%Output:
% - predicted_labels: the predicted labels based on the k-NN
% algorithm
% - nn_index: the index of the nearest training data point for each
training sample (Mx1).
% - accuracy: if the testing labels are supported, the accuracy of
% the classification is returned, otherwise it will be zero.
%checks
if nargin < 4
error('Too few input arguments.')
elseif nargin < 5
numoftraining_labels=[];
accuracy=0;
end
if size(data,2)~=size(t_data,2)
error('data should have the same dimensionality');
end
if mod(k,2)==0
error('to reduce the chance of ties, please choose odd k');
end
%initialization
predicted_labels=zeros(size(t_data,1),1);
ed=zeros(size(t_data,1),size(data,1)); %ed: (MxN) euclidean distances
ind=zeros(size(t_data,1),size(data,1)); %corresponding indices (MxN)
k_nn=zeros(size(t_data,1),k); %k-nearest neighbors for testing sample (Mxk)
%calc euclidean distances between each testing data point and the training
%data samples
for test_point=1:size(t_data,1)
for train_point=1:size(data,1)
%calc and store sorted euclidean distances with corresponding indices
ed(test_point,train_point)=sqrt(...
sum((t_data(test_point,:)-data(train_point,:)).^2));
end
[ed(test_point,:),ind(test_point,:)]=sort(ed(test_point,:));
end
%find the nearest k for each data point of the testing data
k_nn=ind(:,1:k);
nn_index=k_nn(:,1);
%get the majority vote
for i=1:size(k_nn,1)
options=unique(labels(k_nn(i,:)'));

max_count=0;
max_label=0;
for j=1:length(options)
L=length(find(labels(k_nn(i,:)')==options(j)));
if L>max_count
max_label=options(j);
max_count=L;
end
end
predicted_labels(i)=max_label;
end
%calculate the classification accuracy
if isempty(t_labels)==0
accuracy=length(find(predicted_labels==t_labels))/size(t_data,1);
end

1 out of 3

+13062052269

info@desklib.com

K-Nearest Neighbors Algorithm for Iris Data Classification

Contribute Materials

Secure Best Marks with AI Grader

Related Documents

Unsupervised Learning in Data Analysis

Classification Methods in Machine Learning

CSE5DMI Data Mining Assignment 2

Experiments with Linear and Non-Linear Regression for Peak Demand Prediction