Regularized bundle methods for convex and non-convex risks

Authors:
Trinh-Minh-Tri Do;Thierry Artières
Affiliations:
Idiap Research Institute, Martigny, Switzerland;LIP6 - Université Pierre et Marie Curie, Paris, France
Venue:
The Journal of Machine Learning Research
Year:
2012

Citing 19
Cited 1

A tutorial on hidden Markov models and selected applications in speech recognition

Readings in speech recognition
A comparison of approaches to on-line handwritten character recognition

A comparison of approaches to on-line handwritten character recognition
DC programming: overview

Journal of Optimization Theory and Applications
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
The concave-convex procedure

Neural Computation
A DC-programming algorithm for kernel selection

ICML '06 Proceedings of the 23rd international conference on Machine learning
Trading convexity for scalability

ICML '06 Proceedings of the 23rd international conference on Machine learning
Discriminative unsupervised learning of structured predictors

ICML '06 Proceedings of the 23rd international conference on Machine learning
Training linear SVMs in linear time

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A fast learning algorithm for deep belief nets

Neural Computation
Pegasos: Primal Estimated sub-GrAdient SOlver for SVM

Proceedings of the 24th international conference on Machine learning
A scalable modular convex solver for regularized risk minimization

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Large-Margin Discriminative Training of Hidden Markov Models for Speech Recognition

ICSC '07 Proceedings of the International Conference on Semantic Computing
Optimized cutting plane algorithm for support vector machines

Proceedings of the 25th international conference on Machine learning
Efficient multiclass maximum margin clustering

Proceedings of the 25th international conference on Machine learning
A Fast Method for Training Linear SVM in the Primal

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Large margin training for hidden Markov models with partially observed states

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Cutting-plane training of structural SVMs

Machine Learning
Semi-Supervised Learning

Semi-Supervised Learning

One-class conditional random fields for sequential anomaly detection

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Machine learning is most often cast as an optimization problem. Ideally, one expects a convex objective function to rely on efficient convex optimizers with nice guarantees such as no local optima. Yet, non-convexity is very frequent in practice and it may sometimes be inappropriate to look for convexity at any price. Alternatively one can decide not to limit a priori the modeling expressivity to models whose learning may be solved by convex optimization and rely on non-convex optimization algorithms. The main motivation of this work is to provide efficient and scalable algorithms for non-convex optimization. We focus on regularized unconstrained optimization problems which cover a large number of modern machine learning problems such as logistic regression, conditional random fields, large margin estimation, etc. We propose a novel algorithm for minimizing a regularized objective that is able to handle convex and non-convex, smooth and non-smooth risks. The algorithm is based on the cutting plane technique and on the idea of exploiting the regularization term in the objective function. It may be thought as a limited memory extension of convex regularized bundle methods for dealing with convex and non convex risks. In case the risk is convex the algorithm is proved to converge to a stationary solution with accuracy e with a rate O(1/λε) where λ is the regularization parameter of the objective function under the assumption of a Lipschitz empirical risk. In case the risk is not convex getting such a proof is more difficult and requires a stronger and more disputable assumption. Yet we provide experimental results on artificial test problems, and on five standard and difficult machine learning problems that are cast as convex and non-convex optimization problems that show how our algorithm compares well in practice with state of the art optimization algorithms.