A tutorial on hidden Markov models and selected applications in speech recognition
Readings in speech recognition
A comparison of approaches to on-line handwritten character recognition
A comparison of approaches to on-line handwritten character recognition
Journal of Optimization Theory and Applications
Transductive Inference for Text Classification using Support Vector Machines
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Neural Computation
A DC-programming algorithm for kernel selection
ICML '06 Proceedings of the 23rd international conference on Machine learning
Trading convexity for scalability
ICML '06 Proceedings of the 23rd international conference on Machine learning
Discriminative unsupervised learning of structured predictors
ICML '06 Proceedings of the 23rd international conference on Machine learning
Training linear SVMs in linear time
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A fast learning algorithm for deep belief nets
Neural Computation
Pegasos: Primal Estimated sub-GrAdient SOlver for SVM
Proceedings of the 24th international conference on Machine learning
A scalable modular convex solver for regularized risk minimization
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Large-Margin Discriminative Training of Hidden Markov Models for Speech Recognition
ICSC '07 Proceedings of the International Conference on Semantic Computing
Optimized cutting plane algorithm for support vector machines
Proceedings of the 25th international conference on Machine learning
Efficient multiclass maximum margin clustering
Proceedings of the 25th international conference on Machine learning
A Fast Method for Training Linear SVM in the Primal
ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Large margin training for hidden Markov models with partially observed states
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Cutting-plane training of structural SVMs
Machine Learning
Semi-Supervised Learning
One-class conditional random fields for sequential anomaly detection
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
Machine learning is most often cast as an optimization problem. Ideally, one expects a convex objective function to rely on efficient convex optimizers with nice guarantees such as no local optima. Yet, non-convexity is very frequent in practice and it may sometimes be inappropriate to look for convexity at any price. Alternatively one can decide not to limit a priori the modeling expressivity to models whose learning may be solved by convex optimization and rely on non-convex optimization algorithms. The main motivation of this work is to provide efficient and scalable algorithms for non-convex optimization. We focus on regularized unconstrained optimization problems which cover a large number of modern machine learning problems such as logistic regression, conditional random fields, large margin estimation, etc. We propose a novel algorithm for minimizing a regularized objective that is able to handle convex and non-convex, smooth and non-smooth risks. The algorithm is based on the cutting plane technique and on the idea of exploiting the regularization term in the objective function. It may be thought as a limited memory extension of convex regularized bundle methods for dealing with convex and non convex risks. In case the risk is convex the algorithm is proved to converge to a stationary solution with accuracy e with a rate O(1/λε) where λ is the regularization parameter of the objective function under the assumption of a Lipschitz empirical risk. In case the risk is not convex getting such a proof is more difficult and requires a stronger and more disputable assumption. Yet we provide experimental results on artificial test problems, and on five standard and difficult machine learning problems that are cast as convex and non-convex optimization problems that show how our algorithm compares well in practice with state of the art optimization algorithms.