Gradient-Based Optimization of Hyperparameters

Authors:
Yoshua Bengio
Affiliations:
Département d'informatique et recherche opérationnelle, Université de Montréal, Montréal, Québec, Canada, H3C 3J7
Venue:
Neural Computation
Year:
2000

Citing 8
Cited 25

Computational vision and regularization theory

Nature
Learning translation invariant recognition in massively parallel networks

Volume I: Parallel architectures on PARLE: Parallel Architectures and Languages Europe
Optimal brain damage

Advances in neural information processing systems 2
Exact calculation of the Hessian matrix for the multilayer perceptron

Neural Computation
On-line learning and stochastic approximations

On-line learning in neural networks
Stochastic Complexity in Statistical Inquiry Theory

Stochastic Complexity in Statistical Inquiry Theory
Adaptive Regularization in Neural Network Modeling

Neural Networks: Tricks of the Trade, this book is an outgrowth of a 1996 NIPS workshop
Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)

Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)

Choosing Multiple Parameters for Support Vector Machines

Machine Learning
Optimization of the SVM Kernels Using an Empirical Error Minimization Scheme

SVM '02 Proceedings of the First International Workshop on Pattern Recognition with Support Vector Machines
Variable selection using svm based criteria

The Journal of Machine Learning Research
Distance-function design and fusion for sequence data

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Feature Scaling for Kernel Fisher Discriminant Analysis Using Leave-One-Out Cross Validation

Neural Computation
Optimizing resources in model selection for support vector machine

Pattern Recognition
Analysis of SVM regression bounds for variable ranking

Neurocomputing
A model for parameter setting based on Bayesian networks

Engineering Applications of Artificial Intelligence
Automatic parameter tuning with a Bayesian case-based reasoning system. A case of study

Expert Systems with Applications: An International Journal
Associated evolution of a support vector machine-based classifier for pedestrian detection

Information Sciences: an International Journal
Model selection for the LS-SVM. Application to handwriting recognition

Pattern Recognition
A multi-model selection framework for unknown and/or evolutive misclassification cost problems

Pattern Recognition
Auto claim fraud detection using Bayesian learning neural networks

Expert Systems with Applications: An International Journal
Experimental evaluation of an automatic parameter setting system

Expert Systems with Applications: An International Journal
Window-based example selection in learning vector quantization

Neural Computation
Parameter screening and optimisation for ILP using designed experiments

ILP'09 Proceedings of the 19th international conference on Inductive logic programming
A novel approach for analog fault diagnosis based on neural networks and improved kernel PCA

Neurocomputing
Parameter Screening and Optimisation for ILP using Designed Experiments

The Journal of Machine Learning Research
A novel neural-network approach of analog fault diagnosis based on kernel discriminant analysis and particle swarm optimization

Applied Soft Computing
A neuro-fuzzy network to generate human-understandable knowledge from data

Cognitive Systems Research
Low-cost model selection for SVMs using local features

Engineering Applications of Artificial Intelligence
Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Predicting time series of railway speed restrictions with time-dependent machine learning techniques

Expert Systems with Applications: An International Journal
Quantifying the reliability of fault classifiers

Information Sciences: an International Journal
Sharpened graph ensemble for semi-supervised learning

Intelligent Data Analysis

Quantified Score

Hi-index	0.01

Visualization

Abstract

Many machine learning algorithms can be formulated as the minimization of a training criterion that involves a hyperparameter. This hyperparameter is usually chosen by trial and error with a model selection criterion. In this article we present a methodology to optimize several hyperparameters, based on the computation of the gradient of a model selection criterion with respect to the hyperparameters. In the case of a quadratic training criterion, the gradient of the selection criterion with respect to the hyperparameters is efficiently computed by backpropagating through a Cholesky decomposition. In the more general case, we show that the implicit function theorem can be used to derive a formula for the hyperparameter gradient involving second derivatives of the training criterion.