Pegasos: primal estimated sub-gradient solver for SVM

Authors:
Shai Shalev-Shwartz;Yoram Singer;Nathan Srebro;Andrew Cotter
Affiliations:
The Hebrew University of Jerusalem, School of Computer Science and Engineering, Jerusalem, Israel;Google, Mountain View, CA, USA;Toyota Technological Institute at Chicago, Chicago, IL, USA;Toyota Technological Institute at Chicago, Chicago, IL, USA
Venue:
Mathematical Programming: Series A and B - Special Issue on "Optimization and Machine learning"; Alexandre d’Aspremont • Francis Bach • Inderjit S. Dhillon • Bin Yu
Year:
2011

Citing 0
Cited 16

A GPU-tailored approach for training kernelized SVMs

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Asynchronous peer-to-peer data mining with stochastic gradient descent

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Large Linear Classification When Data Cannot Fit in Memory

ACM Transactions on Knowledge Discovery from Data (TKDD)
Review: Supervised classification and mathematical optimization

Computers and Operations Research
Automatic Korean word spacing using Pegasos algorithm

Information Processing and Management: an International Journal
Separable approximate optimization of support vector machines for distributed sensing

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Dependency-based semantic role labeling using sequence labeling with a structural SVM

Pattern Recognition Letters
Searching informative concept banks for video event detection

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
MI2LS: multi-instance learning from multiple informationsources

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Instance Annotation for Multi-Instance Multi-Label Learning

ACM Transactions on Knowledge Discovery from Data (TKDD) - Special Issue on ACM SIGKDD 2012
Clickage: towards bridging semantic and intent gaps via mining click logs of search engines

Proceedings of the 21st ACM international conference on Multimedia
Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training

The Journal of Machine Learning Research
Entity disambiguation in anonymized graphs using graph kernels

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Block coordinate descent algorithms for large-scale sparse multiclass classification

Machine Learning
Large-scale linear nonparallel support vector machine solver

Neural Networks
Premise Selection for Mathematics by Corpus Analysis and Kernel Methods

Journal of Automated Reasoning

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe and analyze a simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy $${\epsilon}$$ is $${\tilde{O}(1 / \epsilon)}$$, where each iteration operates on a single training example. In contrast, previous analyses of stochastic gradient descent methods for SVMs require $${\Omega(1 / \epsilon^2)}$$ iterations. As in previously devised SVM solvers, the number of iterations also scales linearly with 1/λ, where λ is the regularization parameter of SVM. For a linear kernel, the total run-time of our method is $${\tilde{O}(d/(\lambda \epsilon))}$$, where d is a bound on the number of non-zero features in each example. Since the run-time does not depend directly on the size of the training set, the resulting algorithm is especially suited for learning from large datasets. Our approach also extends to non-linear kernels while working solely on the primal objective function, though in this case the runtime does depend linearly on the training set size. Our algorithm is particularly well suited for large text classification problems, where we demonstrate an order-of-magnitude speedup over previous SVM learning methods.