logo

Desklib - Online Library for Study Material

   

Added on  2022-12-21

12 Pages4007 Words89 Views
Introduction-
Machine Learning is taken into account as a subfield of AI and it's involved with the event of
techniques and strategies that modify the pc to be told. In easy terms development of algorithms
that modify the machine to be told and perform tasks and activities. Machine learning overlaps
with statistics in many ways [1].
Support Vector Machine (SVM) could be a classification and regression prediction tool that uses
machine learning theory to maximise prophetical accuracy whereas mechanically avoiding over-
fit to the info. Support Vector machines will be outlined as systems that use hypothesis house of
a linear functions in a very high dimensional feature house, trained with a learning rule from
optimisation theory that implements a learning bias derived from applied mathematics learning
theory. Support vector machine was at the start popular the NIPS community and now's an
energetic a part of the machine learning analysis round the world. SVM becomes noted once,
victimisation pel maps as input; it offers accuracy similar to subtle neural networks with careful
options in a very handwriting recognition task [2]. it's conjointly being employed for several
applications, like hand writing analysis, face analysis and then forth, particularly for pattern
classification and regression primarily based applications.The foundations of Support Vector
Machines (SVM) are developed by Vapnik [3]. and gained quality thanks to several promising
options like higher empirical performance.
Part A
Answer1:
Set 1 The best kernel is the linear one. Rationale: samples seem to be drawn from two Gaussians
of the same covariance, whose decision boundary is linear. It is always best to choose the
simplest model to discourage overfitting. You can also choose by the error rate on the test set.
Set 2 The best kernel is the second order polynomial one. Rationale: the decision boundary
between the two classes seems to be a parabola, a curve of degree 2. It is certainly not linear, and
the Gaussian kernel overfits. You can also choose by the error rate on the test set.
Set 3 The best kernel is the Gaussian kernel. Rationale: the clusters are clearly not separable
with curves of degree one or two. Since the Gaussian kernel always separates points, it is the best
choice here. Also, points are similar under the Gaussian kernel if they are close to each other in
the original space. It is clear the small distance to the cluster center is the defining property of the
classes. You can also choose by the error rate on the test set.
C=1000; %value of C
param=20; % randomly choosen standard deviation

train_data=set1_train; % traning set
test_data=set1_test; % testing set
svm_test(@Klinear,param,C,train_data,test_data) %linear kerner svm for first
srt of data
%%
train_data=set2_train; % traning set
test_data=set2_test; % testing set
svm_test(@Kpoly,param,C,train_data,test_data) %polynomial kerner svm for
first srt of data
%%
train_data=set3_train; % traning set
test_data=set3_test; % testing set
svm_test(@Kgaussian,param,C,train_data,test_data) %polynomial kerner svm for
first srt of data
Answer 2
linear 2nd order Gaussian 0.1375 0.12 0.085 All the three kernels perform better
than logistic regression (which was 0.1425).
function svm = nusvm_train(data, kernel, param, nu)
y = data.y;
X = data.X;
n = length(y);
% evaluate the kernel matrix
K = feval(kernel,X,X,param); % n x n positive semi-definite matrix
K = (K+K’)/2; % should be symmetric. if not, may replace by equiv symm kernel.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%% % For Part 4 of the problem, you
must fill in the following section. % Make sure you undestand the parameters to
’quadprog’ (doc quadprog) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
D = diag(y); % diagonal matrix with D(i,i) = y(i) H = D*K*D; % H(i,j) = y(i)*K(i,j)*y(j)
f = zeros(n,1); A = -ones(1,n); b = -nu; Aeq = y’; beq = 0.0; LB = zeros(n,1); UB =
1/n * ones(n,1); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% X0 = zeros(n,1);
warning off; % suppress ’Warning: Large-scale method ...’ alpha = quadprog(H+1e-
10*eye(n),f,A,b,Aeq,beq,LB,UB,X0) warning on;
% essentially, we have added a (weak) regularization term to

% the dual problem favoring minimum-norm alpha when solution % is
underdetermined. this is also important numerically
% as any round-off error in computation of H could potentially % cause dual
problem to become ill-posed (minimizer at infinity).
% regularization term forces Hessian to be positive definite.
% select support vectors. S = find(alpha > eps); NS = length(S); beta =
alpha(S).*y(S); XS = X(S,:);
% estimate w0 robustly (bias parameter) dpos = find((y > 0) & (alpha > 0) & (alpha
< 1/n));
dneg = find((y < 0) & (alpha > 0) & (alpha < 1/n));
margvecs = [dpos ; dneg];
npos = length(dpos);
nneg = length(dneg);
Mpos = reshape(repmat(reshape(K(S,dpos), [NS npos 1]), [1 1 nneg]), [NS npos *
nneg]);
Mneg = reshape(repmat(reshape(K(S,dneg), [NS 1 nneg]), [1 npos 1]), [NS npos *
nneg]);
rho = mean(0.5*beta’*(Mpos - Mneg)); w0 = median(rho*y(margvecs) -
sum(diag(beta)*K(S,margvecs))’);
% store the results
svm.kernel = kernel;
svm.NS = NS;
svm.w0 = w0;
svm.beta = beta;
svm.XS = XS;
svm.rho = rho;
svm.param = param;
Naive Bayes model is easy to build and particularly useful for very large data sets. Along with simplicity,
Naive Bayes is known to outperform even highly sophisticated classification methods.
Bayes theorem provides a way of calculating posterior probability P(c|x) from P(c), P(x) and P(x|c). Look
at the equation below:

Given training dataset as
X1 2
X2 9
We need to create frequency table and likelihood table.
P(c|x) is the posterior probability of class (
target) given predictor (
attribute).
P(c) is the prior probability of class.
P(x|c) is the likelihood which is the probability of predictor given class.
P(x) is the prior probability of predictor.
PART B
PCA and clustering –
Principal element Analysis (PCA) may be a method that uses AN orthogonal transformation that converts
a group of correlate variables to a group of unrelated variables. PCA may be a most generally used tool in

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Machine Learning
|10
|1323
|244

Data Analysis using Rapid Miner | Neural Networks and Sales
|11
|2041
|24