Pegasos: Primal Estimated sub-GrAdient SOlver for SVM

Authors:
Shai Shalev-Shwartz;Yoram Singer;Nathan Srebro
Affiliations:
The Hebrew University, Jerusalem, Israel;The Hebrew University, Jerusalem, Israel;Toyota Technological Institute, Chicago
Venue:
Proceedings of the 24th international conference on Machine learning
Year:
2007

Citing 14
Cited 158

Making large-scale support vector machine learning practical

Advances in kernel methods
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Large Margin Classification Using the Perceptron Algorithm

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Parallel Optimization: Theory, Algorithms and Applications

Parallel Optimization: Theory, Algorithms and Applications
Efficient svm training using low-rank kernel representations

The Journal of Machine Learning Research
Convex Optimization

Convex Optimization
Solving large scale linear prediction problems using stochastic gradient descent algorithms

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Training linear SVMs in linear time

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Online Passive-Aggressive Algorithms

The Journal of Machine Learning Research
QP Algorithms with Guaranteed Accuracy and Run Time for Support Vector Machines

The Journal of Machine Learning Research
Logarithmic regret algorithms for online convex optimization

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Online learning with kernels

IEEE Transactions on Signal Processing
On the generalization ability of on-line learning algorithms

IEEE Transactions on Information Theory

Automatically Acquiring Causal Expression Patterns from Relation-annotated Corpora to Improve Question Answering for why-Questions

ACM Transactions on Asian Language Information Processing (TALIP)
An empirical evaluation of supervised learning in high dimensions

Proceedings of the 25th international conference on Machine learning
Efficient projections onto the l1-ball for learning in high dimensions

Proceedings of the 25th international conference on Machine learning
Optimized cutting plane algorithm for support vector machines

Proceedings of the 25th international conference on Machine learning
A dual coordinate descent method for large-scale linear SVM

Proceedings of the 25th international conference on Machine learning
SVM optimization: inverse dependence on training set size

Proceedings of the 25th international conference on Machine learning
Trust Region Newton Method for Logistic Regression

The Journal of Machine Learning Research
A sequential dual method for large scale multi-class linear svms

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Classification with partial labels

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Training structural svms with kernels using sampled cuts

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
A Fast Method for Training Linear SVM in the Primal

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Improving Classification with Pairwise Constraints: A Margin-Based Approach

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Metric Learning: A Support Vector Approach

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines

The Journal of Machine Learning Research
Online Learning of Complex Prediction Problems Using Simultaneous Projections

The Journal of Machine Learning Research
Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks

The Journal of Machine Learning Research
LIBLINEAR: A Library for Large Linear Classification

The Journal of Machine Learning Research
Using English information in non-English web search

Proceedings of the 2nd ACM workshop on Improving non english web searching
An algebraic characterization of the optimum of regularized kernel methods

Machine Learning
Good learners for evil teachers

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Proximal regularization for online and batch learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
An efficient projection for l1, ∞ regularization

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Stochastic methods for l1 regularized loss minimization

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
A simpler unified analysis of budget perceptrons

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Predicting bounce rates in sponsored search advertisements

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Refined experts: improving classification in large taxonomies

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Automating knowledge capture in the aerospace domain

Proceedings of the fifth international conference on Knowledge capture
Cutting-plane training of structural SVMs

Machine Learning
Comprehensive query-dependent fusion using regression-on-folksonomies: a case study of multimodal music search

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Kernel Methods in Computer Vision

Foundations and Trends® in Computer Graphics and Vision
Ranking structured documents: a large margin based approach for patent prior art search

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Streamed learning: one-pass SVMs

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Structured prediction by joint kernel support estimation

Machine Learning
Periodic step-size adaptation in second-order gradient descent for single-pass on-line structured learning

Machine Learning
Training parsers by inverse reinforcement learning

Machine Learning
Exploiting bilingual information to improve web search

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Automatic content-based categorization of Wikipedia articles

People's Web '09 Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent

The Journal of Machine Learning Research
Dlib-ml: A Machine Learning Toolkit

The Journal of Machine Learning Research
Optimized Cutting Plane Algorithm for Large-Scale Risk Minimization

The Journal of Machine Learning Research
Learning When Concepts Abound

The Journal of Machine Learning Research
Efficient Online and Batch Learning Using Forward Backward Splitting

The Journal of Machine Learning Research
Bundle Methods for Regularized Risk Minimization

The Journal of Machine Learning Research
Maximum Relative Margin and Data-Dependent Regularization

The Journal of Machine Learning Research
Probabilistic structured predictors

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Learning large margin likelihoods for realtime head pose tracking

ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
Efficient algorithms for ranking with SVMs

Information Retrieval
Learning to rank only using training data from related domain

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Large linear classification when data cannot fit in memory

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Combined regression and ranking

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Cross-language text classification using structural correspondence learning

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Filtering syntactic constraints for statistical machine translation

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Training and Testing Low-degree Polynomial Data Mappings via Linear SVM

The Journal of Machine Learning Research
Fast and Scalable Local Kernel Machines

The Journal of Machine Learning Research
Improving Hierarchical Classification with Partial Labels

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Large-scale music tag recommendation with explicit multiple attributes

Proceedings of the international conference on Multimedia
Maximum margin distance learning for dynamic texture recognition

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part II
Cascaded models for articulated pose estimation

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part II
A fast dual method for HIK SVM learning

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part II
Improving the fisher kernel for large-scale image classification

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Large-scale support vector learning with structural kernels

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
The application of structured learning in natural language processing

Machine Translation
Random Fourier approximations for skewed multiplicative histogram kernels

Proceedings of the 32nd DAGM conference on Pattern recognition
Sentiment knowledge discovery in twitter streaming data

DS'10 Proceedings of the 13th international conference on Discovery science
Mining social images with distance metric learning for automated image tagging

Proceedings of the fourth ACM international conference on Web search and data mining
Document assignment in multi-site search engines

Proceedings of the fourth ACM international conference on Web search and data mining
First and Second Order SMO Algorithms for LS-SVM Classifiers

Neural Processing Letters
On Learning and Cross-Validation with Decomposed Nyström Approximation of Kernel Matrix

Neural Processing Letters
Accelerated training of maximum margin Markov models for sequence labeling: a case study of NP chunking

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Incrementally maintaining classification using an RDBMS

Proceedings of the VLDB Endowment
Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization

The Journal of Machine Learning Research
Tree Decomposition for Large-Scale SVM Problems

The Journal of Machine Learning Research
Multitask Sparsity via Maximum Entropy Discrimination

The Journal of Machine Learning Research
Learning conditional random fields from unaligned data for natural language understanding

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Query weighting for ranking model adaptation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
lexically-triggered hidden Markov models for clinical document coding

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Ranking related news predictions

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Trading representability for scalability: adaptive multi-hyperplane machine for nonlinear classification

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Detecting adversarial advertisements in the wild

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Selective block minimization for faster convergence of limited memory large-scale linear models

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
A GPU-tailored approach for training kernelized SVMs

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Laplacian Support Vector Machines Trained in the Primal

The Journal of Machine Learning Research
Super-Linear Convergence of Dual Augmented Lagrangian Algorithm for Sparsity Regularized Estimation

The Journal of Machine Learning Research
Stochastic Methods for l1-regularized Loss Minimization

The Journal of Machine Learning Research
Cross-Lingual Adaptation Using Structural Correspondence Learning

ACM Transactions on Intelligent Systems and Technology (TIST)
Correlated multi-label feature selection

Proceedings of the 20th ACM international conference on Information and knowledge management
Named entity recognition using a modified Pegasos algorithm

Proceedings of the 20th ACM international conference on Information and knowledge management
Improved answer ranking in social question-answering portals

Proceedings of the 3rd international workshop on Search and mining user-generated contents
Efficient Learning with Partially Observed Attributes

The Journal of Machine Learning Research
Maximum margin ranking algorithms for information retrieval

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Structured sparsity in structured prediction

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Optimal distributed online prediction using mini-batches

The Journal of Machine Learning Research
Multi kernel learning with online-batch optimization

The Journal of Machine Learning Research
Retrieving informative content from web pages with conditional learning of support vector machines and semantic analysis

ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part II
A latent variable ranking model for content-based retrieval

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Large linear classification when data cannot fit in memory

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
A review of optimization methodologies in support vector machines

Neurocomputing
Efficient Euclidean projections via Piecewise Root Finding and its application in gradient projection

Neurocomputing
Linear support vector machines via dual cached loops

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Rank-loss support instance machines for MIML instance annotation

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast Structured Prediction Using Large Margin Sigmoid Belief Networks

International Journal of Computer Vision
Hierarchical linear support vector machine

Pattern Recognition
Hope and fear for discriminative training of statistical translation models

The Journal of Machine Learning Research
Manifold identification in dual averaging for regularized stochastic online learning

The Journal of Machine Learning Research
Inhibition in multiclass classification

Neural Computation
Unexpected challenges in large scale machine learning

Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Online feature selection for mining big data

Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Maxi-Min discriminant analysis via online learning

Neural Networks
Review: Supervised classification and mathematical optimization

Computers and Operations Research
Optimized online rank learning for machine translation

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Vine pruning for efficient multi-pass dependency parsing

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Joint feature selection in distributed stochastic learning for large-scale discriminative training in SMT

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Discriminative pronunciation modeling: a large-margin, feature-rich approach

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Fast online training with frequency-adaptive learning rates for Chinese word segmentation and new word detection

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Inducing a discriminative parser to optimize machine translation reordering

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Learning to rank search results for time-sensitive queries

Proceedings of the 21st ACM international conference on Information and knowledge management
Automatic Korean word spacing using Pegasos algorithm

Information Processing and Management: an International Journal
To track or to detect? an ensemble framework for optimal selection

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
Minimal correlation classification

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VI
Linearized smooth additive classifiers

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part I
Generating pseudo test collections for learning to rank scientific articles

CLEF'12 Proceedings of the Third international conference on Information Access Evaluation: multilinguality, multimodality, and visual analytics
Stochastic coordinate descent methods for regularized smooth and nonsmooth losses

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Error-correcting output codes based ensemble feature extraction

Pattern Recognition
Playing by the rules: mining query associations to predict search performance

Proceedings of the sixth ACM international conference on Web search and data mining
Modeling the impact of lifestyle on health at scale

Proceedings of the sixth ACM international conference on Web search and data mining
Hyperdisk based large margin classifier

Pattern Recognition
Probabilistic Chinese word segmentation with non-local information and stochastic training

Information Processing and Management: an International Journal
Dependency-based semantic role labeling using sequence labeling with a structural SVM

Pattern Recognition Letters
Large-scale visual concept detection with explicit kernel maps and power mean SVM

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
VISOR: towards on-the-fly large-scale object category retrieval

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part II
A low rank structural large margin method for cross-modal ranking

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Pseudo test collections for training and tuning microblog rankers

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Which work-item updates need your response?

Proceedings of the 10th Working Conference on Mining Software Repositories
Fast and scalable polynomial kernels via explicit feature maps

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Indexed block coordinate descent for large-scale linear classification with limited memory

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Recursive regularization for large-scale classification with hierarchical and graphical dependencies

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Big data analytics with small footprint: squaring the cloud

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Sparse online topic models

Proceedings of the 22nd international conference on World Wide Web
Online algorithm based on support vectors for orthogonal regression

Pattern Recognition Letters
Efficient online learning for multitask feature selection

ACM Transactions on Knowledge Discovery from Data (TKDD)
Instance Annotation for Multi-Instance Multi-Label Learning

ACM Transactions on Knowledge Discovery from Data (TKDD) - Special Issue on ACM SIGKDD 2012
Modelling political disaffection from Twitter data

Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining
Cross-media semantic representation via bi-directional learning to rank

Proceedings of the 21st ACM international conference on Multimedia
Stochastic dual coordinate ascent methods for regularized loss

The Journal of Machine Learning Research
JKernelMachines: a simple framework for kernel machine

The Journal of Machine Learning Research
Large-scale linear support vector regression

The Journal of Machine Learning Research
Regularized bundle methods for convex and non-convex risks

The Journal of Machine Learning Research
Smoothing multivariate performance measures

The Journal of Machine Learning Research
Image annotation with weak labels

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Training Lp norm multiple kernel learning in the primal

Neural Networks
b-bit minwise hashing in practice

Proceedings of the 5th Asia-Pacific Symposium on Internetware
Eigenvalue decay: A new method for neural network regularization

Neurocomputing
Towards understanding global spread of disease from everyday interpersonal interactions

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Real-time traffic sign recognition in three stages

Robotics and Autonomous Systems
Gesture synthesis adapted to speech emphasis

Speech Communication
RankCNN: When learning to rank encounters the pseudo preference feedback

Computer Standards & Interfaces
Convex and scalable weakly labeled SVMs

The Journal of Machine Learning Research
Image Classification with the Fisher Vector: Theory and Practice

International Journal of Computer Vision

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe and analyze a simple and effective iterative algorithm for solving the optimization problem cast by Support Vector Machines (SVM). Our method alternates between stochastic gradient descent steps and projection steps. We prove that the number of iterations required to obtain a solution of accuracy ε is Õ(1/ε). In contrast, previous analyses of stochastic gradient descent methods require Ω (1/ε2) iterations. As in previously devised SVM solvers, the number of iterations also scales linearly with 1/λ, where λ is the regularization parameter of SVM. For a linear kernel, the total run-time of our method is Õ (d/(λε)), where d is a bound on the number of non-zero features in each example. Since the run-time does not depend directly on the size of the training set, the resulting algorithm is especially suited for learning from large datasets. Our approach can seamlessly be adapted to employ non-linear kernels while working solely on the primal objective function. We demonstrate the efficiency and applicability of our approach by conducting experiments on large text classification problems, comparing our solver to existing state-of-the-art SVM solvers. For example, it takes less than 5 seconds for our solver to converge when solving a text classification problem from Reuters Corpus Volume 1 (RCV1) with 800,000 training examples.