logo

Large Scale Support Vector Machines PDF

   

Added on  2020-05-16

15 Pages3689 Words122 Views
1
TECHNIQUES TO IMPROVE EFFICIENCY OF LARGE SCALE SUPPORT VECTOR MACHINES
TECHNIQUES TO IMPROVE EFFICIENCY OF LARGE SCALE SUPPORT
VECTOR MACHINES
Nishanth Yalam
Harrisburg University

2
TECHNIQUES TO IMPROVE EFFICIENCY OF LARGE SCALE SUPPORT VECTOR MACHINES
Abstract
In the proposed research Large Scale Support Vector’s the company needs to improve
efficiency, reducing training data and to decrease the testing phase time. This research
includes the development of algorithm that can improve LSVM efficiency and training and
testing methodology to effectively choose minimum data sets to train and test LSVM. The
scope of the research will be limited to samples, LSVM algorithm and testing phase and will
not include LSVM implementation methods.
After analyzing the problem and LSVM’s needs, a methodology called variable
decomposition and constraint decomposition is chosen to address the training problem. This
methodology evades necessity to use high-end computational resources and also reduces
training time. To improve the generalization, a methodology is adopted to reduce the number
of support vector by optimizing the kernel parameters and also a vector correlating with
principle and greedy algorithm that are also included in the algorithm to further reduce the
support vectors involved in the testing phase.
A detailed list of research tasks and major milestones are included in this proposal. The
research will require an available data generation system to get the potential training data and
testing data. Completing the proposed research will require Latest version of the
programming software OCTAVE for developing and implementing necessary algorithms.
The deliverables of this research will include an improvised SVM algorithm with better
generalization capability. At the time of completion of the research, LSVM algorithm is
completely tested and is installed on client server.

3
TECHNIQUES TO IMPROVE EFFICIENCY OF LARGE SCALE SUPPORT VECTOR MACHINES
1. Introduction
Machine learning is a part of artificial intelligence that concentrates on developing and
designing methods to teach systems to be more intelligent and independent by learning from
data. To exemplify, machine learning can be used to train systems to be able to distinguish
between spam and non-spam messages. The core of machine learning deals with
representation and generalization. Representation is about analyzing the data instances
learning important properties useful for learning. Generalization is the ability of the system to
perform the desired job well on unseen data instances. In another way, machine learning is all
about building a model and training the system with training examples from unknown
probability distribution to enable it to make accurate predictions on new instances. One of the
approaches to achieve machine learning is Support Vector Machines (SVM). Support vector
machines are supervised learning models based on learning algorithms to analyze data and
recognize patterns used for classification and regression analysis.
Traditional training algorithms for SVM such as chunking and SVM capable of scaling super
linearly with the number of examples becomes infeasible for large training sets. As dataset
sizes are steadily growing over past few years, this necessitates the development of training
algorithms that can handle large datasets. Large scale datasets are defined as datasets that
cannot be stored in a modern computer’s memory. Large scale training algorithms use one of
the following methods
1. Variants of primal stochastic gradient descent (SGD)
2. Quadratic programming in the dual
SGD generalizes well even though it is poor at optimization. Popular algorithms that use
SGD are PEGASOS and FOLOS.

4
TECHNIQUES TO IMPROVE EFFICIENCY OF LARGE SCALE SUPPORT VECTOR MACHINES
In general, LSVM model is used for linear classifications but with the help of kernel trick can
be applied to non-linear classification also. LSVM classifiers are used in Credit risk
evaluation, text and hypertext categorization, classification of images, in medical sciences.
For example, it classifies proteins up to 90% of the compounds and recognition of hand-
written characters. In past few years, LSVM has become very prominent machine learning
approach drawing much attention from many researchers and companies to invest huge
amounts in developing better algorithms.
1.2 Problem Description
A large training set poses a challenge for the computational complexity of a learning
algorithm demanding for more sophisticated computational equipment increasing the training
cost drastically. Since a huge training cost is involved in large scale support vector machines
(LVSM), small companies have to be very careful in making decisions to use LSVM. Lack of
access to high computational equipment to small companies slows down the training process
thereby, affecting the service time to the customers. Many researchers working on LSVM try
to find an optimal solution to handle large datasets by applying different techniques to make
it cost-effective and faster.
The training of LSVM algorithms requires enormous memory space and considerable
computational time due to enormous amounts of training data and the non-linear
programming problem. In general, most of the LSVM uses a random selection of training
samples resulting in large training times to identify prominent properties of training sets. The
main drawback in random selection is significant randomness that is involved in the training
sets. On the other hand, randomness in sample data is important in improving the
generalization capability of the algorithm, as it can be applied to a wide variety of data.
Training algorithm with random training sets requires high computational capable equipment

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Machine Learning
|10
|1323
|244

Multi-level Hybrid SVM and ELM for Intrusion Detection System
|2
|338
|385

Knowledge-Based Systems Presentation 2022
|14
|13189
|20

Machine Learning Algorithm PDF
|9
|3197
|271

Assignment on Intelligent Systems for Analytics
|47
|6004
|28

Construction of a Simple Machine Learning SVM Classifier in MATLAB
|27
|4516
|281