Bundle Methods for Regularized Risk Minimization

Authors:
Choon Hui Teo;S.V.N. Vishwanthan;Alex J. Smola;Quoc V. Le
Affiliations:
-;-;-;-
Venue:
The Journal of Machine Learning Research
Year:
2010

Citing 44
Cited 27

A shortest augmenting path algorithm for dense and sparse linear assignment problems

Computing
Proximity control in bundle methods for convex

Mathematical Programming: Series A and B
New variants of bundle methods

Mathematical Programming: Series A and B
Support-Vector Networks

Machine Learning
Prediction with Gaussian processes: from linear regression to linear prediction and beyond

Proceedings of the NATO Advanced Study Institute on Learning in graphical models
Making large-scale support vector machine learning practical

Advances in kernel methods
A bundle-Newton method for nonsmooth unconstrained minimization

Mathematical Programming: Series A and B
IR evaluation methods for retrieving highly relevant documents

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic Networks and Expert Systems

Probabilistic Networks and Expert Systems
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Predicting Time Series with Support Vector Machines

ICANN '97 Proceedings of the 7th International Conference on Artificial Neural Networks
Logistic Regression, AdaBoost and Bregman Distances

COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Efficient svm training using low-rank kernel representations

The Journal of Machine Learning Research
Ultraconservative online algorithms for multiclass problems

The Journal of Machine Learning Research
Convex Optimization

Convex Optimization
Unifying collaborative and content-based filtering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Hierarchical document categorization with support vector machines

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Shallow parsing with conditional random fields

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs

The Journal of Machine Learning Research
Large Margin Methods for Structured and Interdependent Output Variables

The Journal of Machine Learning Research
A support vector method for multivariate performance measures

ICML '05 Proceedings of the 22nd international conference on Machine learning
Online Ranking by Projecting

Neural Computation
Estimating the Support of a High-Dimensional Distribution

Neural Computation
Maximum margin planning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Large scale semi-supervised linear SVMs

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Training linear SVMs in linear time

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Training a Support Vector Machine in the Primal

Neural Computation
Nonparametric Quantile Estimation

The Journal of Machine Learning Research
Pegasos: Primal Estimated sub-GrAdient SOlver for SVM

Proceedings of the 24th international conference on Machine learning
A scalable modular convex solver for regularized risk minimization

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Predicting Structured Data (Neural Information Processing)

Predicting Structured Data (Neural Information Processing)
Optimized cutting plane algorithm for support vector machines

Proceedings of the 25th international conference on Machine learning
A dual coordinate descent method for large-scale linear SVM

Proceedings of the 25th international conference on Machine learning
SVM optimization: inverse dependence on training set size

Proceedings of the 25th international conference on Machine learning
A quasi-Newton approach to non-smooth convex optimization

Proceedings of the 25th international conference on Machine learning
Trust Region Newton Method for Logistic Regression

The Journal of Machine Learning Research
LIBLINEAR: A Library for Large Linear Classification

The Journal of Machine Learning Research
Validity of the single processor approach to achieving large scale computing capabilities

AFIPS '67 (Spring) Proceedings of the April 18-20, 1967, spring joint computer conference
Learning Graph Matching

IEEE Transactions on Pattern Analysis and Machine Intelligence
Cutting-plane training of structural SVMs

Machine Learning
Empirical analysis of predictive algorithms for collaborative filtering

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Online learning meets optimization in the dual

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Sequential greedy approximation for certain convex optimization problems

IEEE Transactions on Information Theory
Decoding by linear programming

IEEE Transactions on Information Theory

A Quasi-Newton Approach to Nonsmooth Convex Optimization Problems in Machine Learning

The Journal of Machine Learning Research
An experimental comparison of cross-validation techniques for estimating the area under the ROC curve

Computational Statistics & Data Analysis
A Comparison of Optimization Methods and Software for Large-scale L1-regularized Linear Classification

The Journal of Machine Learning Research
Training linear ranking SVMs in linearithmic time using red-black trees

Pattern Recognition Letters
Document clustering with universum

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Trading representability for scalability: adaptive multi-hyperplane machine for nonlinear classification

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Multi-view transfer learning with a large margin approach

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Serendipitous learning: learning beyond the predefined label space

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
An improved training algorithm for the linear ranking support vector machine

ICANN'11 Proceedings of the 21th international conference on Artificial neural networks - Volume Part I
Accelerated training of max-margin Markov networks with kernels

ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers

Foundations and Trends® in Machine Learning
Structured Learning and Prediction in Computer Vision

Foundations and Trends® in Computer Graphics and Vision
Sentiment detection with auxiliary data

Information Retrieval
Linear support vector machines via dual cached loops

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
An improved GLMNET for L1-regularized logistic regression

The Journal of Machine Learning Research
Review: Supervised classification and mathematical optimization

Computers and Operations Research
Optimized online rank learning for machine translation

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
An inexact spectral bundle method for convex quadratic semidefinite programming

Computational Optimization and Applications
Latent pyramidal regions for recognizing scenes

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
Efficient protocols for distributed classification and optimization

ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory
MI2LS: multi-instance learning from multiple informationsources

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
A time-dependent enhanced support vector machine for time series regression

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training

The Journal of Machine Learning Research
Smoothing multivariate performance measures

The Journal of Machine Learning Research
Cost-sensitive learning for large-scale hierarchical classification

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
One-class conditional random fields for sequential anomaly detection

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Accelerated training of max-margin Markov networks with kernels

Theoretical Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

A wide variety of machine learning problems can be described as minimizing a regularized risk functional, with different algorithms using different notions of risk and different regularizers. Examples include linear Support Vector Machines (SVMs), Gaussian Processes, Logistic Regression, Conditional Random Fields (CRFs), and Lasso amongst others. This paper describes the theory and implementation of a scalable and modular convex solver which solves all these estimation problems. It can be parallelized on a cluster of workstations, allows for data-locality, and can deal with regularizers such as L1 and L2 penalties. In addition to the unified framework we present tight convergence bounds, which show that our algorithm converges in O(1/ε) steps to ε precision for general convex problems and in O(log (1/ε)) steps for continuously differentiable problems. We demonstrate the performance of our general purpose solver on a variety of publicly available data sets.