logo

(PDF) SVM Classification with Linear and RBF kernels

5 Pages1826 Words79 Views
   

Added on  2021-05-27

(PDF) SVM Classification with Linear and RBF kernels

   Added on 2021-05-27

ShareRelated Documents
Practical 9 - Predictive ModellingAnswer each of the questions below using the examples and code provided in working python file:1. How many features are there for theiris dataset? How many examples? How many labels?There are four features in the iris dataset. These features are measured in centimetres. The features are:1.Sepal length2.Sepal width3.Petal length4.Petal widthEach column is a feature (also known as: Predictor, attribute, Independent Variable, input, regressor,Covariate)There are 50 samples for each specie (Iris Setosa,Iris virginica andIris versicolor) of Iris flower. This results in 150 records (examples) where each observation will have 4 features, as stated above. Eachrow is an observation (also known as: sample, example, instance, record)Labels are also known as targets. Each value that we predict is the response (also known as: target, outcome, label, dependent variable.Classification is a supervised learning where label is categorical. There are 150 labels in iris dataset falling under 3 categories:0= Setosa1= Versicolor2= Virginica2. Why is it important to split the dataset into training and test set? Why a classification model needs to be trained on the training set and the prediction performance needs to be measured on the test set?In Machine Learning, we make a model which is nothing but an algorithm where some parameters needs to be modified such that it is able to perform good at the application i.e. it is able to predict values of one wants to.We can train the model using data which we call as training data or training set. The training data is the one which already has the actual value that the model should have predicted and thus the algorithm changes the value of parameters to account for the data in the training set.To know after training the model is overall good or not, we have test data/test set which is basically a different data for which we know the values but this data was never shown to the model before. Thus, if the model after training is performing good on test set as well then, we can say that the Machine Learning model is good.
(PDF) SVM Classification with Linear and RBF kernels_1
It is important to learn the predictive model (i.e. the classifier) on the training set and test its performance on the test set. The purpose of predictive modelling is to create models that are able topredict on future data. Hence it is important to keep training and test data separate and do not use test data for learning predictive models.A classification model can be used to predict the class label of unknown records. A classification technique is a systematic approach to building classification models from an input set. The model generated by a learning algorithm should both fit the input data well and correctly predict the class labels of records it has never seen before.First, a training set consisting of records whose class labels are known must be provided. The training set is used to build a classification model, which is subsequently applied to the test set, which consists of records with unknown class labels.Evaluation of the performance of a classification model is based on the counts of test records correctly and incorrectly predicted by the model.3.How correlation analysis can help identify the best features for the classification task? What are the best features for the iris data based on correlation analysis results?Data correlation is the way in which one set of data may correspond to another set. For the classification problem, feature selection aims to select subset of highly discriminant features. In other words, it selects features that are capable of discriminating samples that belong to different classes.For the problem of feature selection for classification, due to the availability of label information, therelevance of features is assessed as the capability of distinguishing classes.For example, a feature fi is said to be relevant to a class cj if fi and cj are highly correlated.Classification is the problem of identifying to which of a set of categories a new observation belongs, on the basis of a training set of data containing observations whose category membership is known.Based on the correlation analysis results, we can see that features petal_length and petal_width arethe best features for iris classification. As per the pair plot graphs, petal_length and petal_width is highly correlated.If you try to train a model on a set of features with no or very little correlation, it will give inaccurate results.4. Which class is easier to identify than the other two classes for the iris dataset? How can you tell it?As per the Correlation analysis results, the class Setosa with target value 0 is easier to identify than the other two classes (1-Versiocolor, 2-Virginica) for the iris dataset.As evident in the plotted graph, Setosa (represented by blue color) is easily separable and can be distinguished by the other two classes of species of iris dataset. Setosa is easy to classify and has an easily separable boundary around it and helps to eliminate it from the other two classes.
(PDF) SVM Classification with Linear and RBF kernels_2

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
WEKA Data Analysis - Assignment
|20
|1581
|387

Data Science Study Material
|18
|2019
|80

Principles of Data Science for Business: Assessment of Verdilan Horticulture Outline
|12
|2120
|88

Object and Data Modelling
|30
|2071
|459

Visual Analysis of Diabetes Dataset Assignment
|11
|517
|35

Data Mining Case Study 2022
|25
|1821
|23