A majorization-minimization algorithm for (multiple) hyperparameter learning

Authors:
Chuan-Sheng Foo;Chuong B. Do;Andrew Y. Ng
Affiliations:
Institute for Infocomm Research, Singapore;Stanford University, Stanford, CA;Stanford University, Stanford, CA
Venue:
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Year:
2009

Citing 12
Cited 3

Bayesian interpolation

Neural Computation
Bayesian regularization and pruning using a Laplace prior

Neural Computation
Bayesian Learning for Neural Networks

Bayesian Learning for Neural Networks
Choosing Multiple Parameters for Support Vector Machines

Machine Learning
Adaptive Regularization in Neural Network Modeling

Neural Networks: Tricks of the Trade, this book is an outgrowth of a 1996 NIPS workshop
The concave-convex procedure

Neural Computation
Adaptive Sparseness for Supervised Learning

IEEE Transactions on Pattern Analysis and Machine Intelligence
Sparse bayesian learning and the relevance vector machine

The Journal of Machine Learning Research
Gradient-Based Adaptation of General Gaussian Kernels

Neural Computation
CONTRAfold

Bioinformatics
Gene selection in cancer classification using sparse logistic regression with Bayesian regularization

Bioinformatics
Globally convergent edge-preserving regularized reconstruction: an application to limited-angle tomography

IEEE Transactions on Image Processing

Active learning and basis selection for kernel-based linear models: a Bayesian perspective

IEEE Transactions on Signal Processing
Conditional topical coding: an efficient topic model conditioned on rich features

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Transfer Metric Learning with Semi-Supervised Extension

ACM Transactions on Intelligent Systems and Technology (TIST)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a general Bayesian framework for hyperparameter tuning in L2-regularized supervised learning models. Paradoxically, our algorithm works by first analytically integrating out the hyperparameters from the model. We find a local optimum of the resulting non-convex optimization problem efficiently using a majorization-minimization (MM) algorithm, in which the non-convex problem is reduced to a series of convex L2-regularized parameter estimation tasks. The principal appeal of our method is its simplicity: the updates for choosing the L2-regularized subproblems in each step are trivial to implement (or even perform by hand), and each subproblem can be efficiently solved by adapting existing solvers. Empirical results on a variety of supervised learning models show that our algorithm is competitive with both grid-search and gradient-based algorithms, but is more efficient and far easier to implement.