Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training

Authors:
Zhuang Wang;Koby Crammer;Slobodan Vucetic
Affiliations:
Department of Computer and Information Sciences, Temple University and Corporate Technology, Siemens Corporation, Princeton, NJ;Department of Electrical Engineering, The Technion, Haifa, Israel;Department of Computer and Information Sciences, Temple University, Philadelphia, PA
Venue:
The Journal of Machine Learning Research
Year:
2012

Citing 36
Cited 0

Support-Vector Networks

Machine Learning
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
The Relaxed Online Maximum Margin Algorithm

Machine Learning
Sparse Online Greedy Support Vector Regression

ECML '02 Proceedings of the 13th European Conference on Machine Learning
A new approximate maximal margin classification algorithm

The Journal of Machine Learning Research
On the algorithmic implementation of multiclass kernel-based vector machines

The Journal of Machine Learning Research
Ultraconservative online algorithms for multiclass problems

The Journal of Machine Learning Research
Sparseness of support vector machines

The Journal of Machine Learning Research
Solving large scale linear prediction problems using stochastic gradient descent algorithms

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Core Vector Machines: Fast SVM Training on Very Large Data Sets

The Journal of Machine Learning Research
An efficient method for simplifying support vector machines

ICML '05 Proceedings of the 22nd international conference on Machine learning
Building Sparse Large Margin Classifiers

ICML '05 Proceedings of the 22nd international conference on Machine learning
Trading convexity for scalability

ICML '06 Proceedings of the 23rd international conference on Machine learning
Training linear SVMs in linear time

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast Kernel Classifiers with Online and Active Learning

The Journal of Machine Learning Research
Online Passive-Aggressive Algorithms

The Journal of Machine Learning Research
Building Support Vector Machines with Reduced Classifier Complexity

The Journal of Machine Learning Research
Support cluster machine

Proceedings of the 24th international conference on Machine learning
Simpler core vector machines with enclosing balls

Proceedings of the 24th international conference on Machine learning
The Forgetron: A Kernel-Based Perceptron on a Budget

SIAM Journal on Computing
A dual coordinate descent method for large-scale linear SVM

Proceedings of the 25th international conference on Machine learning
A simpler unified analysis of budget perceptrons

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
P-packSVM: Parallel Primal grAdient desCent Kernel SVM

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Tighter perceptron with improved dual use of cached data for model representation and validation

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent

The Journal of Machine Learning Research
Bounded Kernel-Based Online Learning

The Journal of Machine Learning Research
Bundle Methods for Regularized Risk Minimization

The Journal of Machine Learning Research
Online training on a budget of support vector machines using twin prototypes

Statistical Analysis and Data Mining
Large linear classification when data cannot fit in memory

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Training and Testing Low-degree Polynomial Data Mappings via Linear SVM

The Journal of Machine Learning Research
Tree Decomposition for Large-Scale SVM Problems

The Journal of Machine Learning Research
Pegasos: primal estimated sub-gradient solver for SVM

Mathematical Programming: Series A and B - Special Issue on "Optimization and Machine learning"; Alexandre d’Aspremont • Francis Bach • Inderjit S. Dhillon • Bin Yu
Trading representability for scalability: adaptive multi-hyperplane machine for nonlinear classification

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Tracking the best hyperplane with a simple budget perceptron

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Online learning with kernels

IEEE Transactions on Signal Processing
Input space versus feature space in kernel-based methods

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Online algorithms that process one example at a time are advantageous when dealing with very large data or with data streams. Stochastic Gradient Descent (SGD) is such an algorithm and it is an attractive choice for online Support Vector Machine (SVM) training due to its simplicity and effectiveness. When equipped with kernel functions, similarly to other SVM learning algorithms, SGD is susceptible to the curse of kernelization that causes unbounded linear growth in model size and update time with data size. This may render SGD inapplicable to large data sets. We address this issue by presenting a class of Budgeted SGD (BSGD) algorithms for large-scale kernel SVM training which have constant space and constant time complexity per update. Specifically, BSGD keeps the number of support vectors bounded during training through several budget maintenance strategies. We treat the budget maintenance as a source of the gradient error, and show that the gap between the BSGD and the optimal SVM solutions depends on the model degradation due to budget maintenance. To minimize the gap, we study greedy budget maintenance methods based on removal, projection, and merging of support vectors. We propose budgeted versions of several popular online SVM algorithms that belong to the SGD family. We further derive BSGD algorithms for multi-class SVM training. Comprehensive empirical results show that BSGD achieves higher accuracy than the state-of-the-art budgeted online algorithms and comparable to non-budget algorithms, while achieving impressive computational efficiency both in time and space during training and prediction.