Learning translation invariant recognition in massively parallel networks
Volume I: Parallel architectures on PARLE: Parallel Architectures and Languages Europe
Advances in neural information processing systems 2
Exact calculation of the Hessian matrix for the multilayer perceptron
Neural Computation
On-line learning and stochastic approximations
On-line learning in neural networks
Stochastic Complexity in Statistical Inquiry Theory
Stochastic Complexity in Statistical Inquiry Theory
Adaptive Regularization in Neural Network Modeling
Neural Networks: Tricks of the Trade, this book is an outgrowth of a 1996 NIPS workshop
Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)
Choosing Multiple Parameters for Support Vector Machines
Machine Learning
Optimization of the SVM Kernels Using an Empirical Error Minimization Scheme
SVM '02 Proceedings of the First International Workshop on Pattern Recognition with Support Vector Machines
Variable selection using svm based criteria
The Journal of Machine Learning Research
Distance-function design and fusion for sequence data
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Optimizing resources in model selection for support vector machine
Pattern Recognition
Analysis of SVM regression bounds for variable ranking
Neurocomputing
A model for parameter setting based on Bayesian networks
Engineering Applications of Artificial Intelligence
Automatic parameter tuning with a Bayesian case-based reasoning system. A case of study
Expert Systems with Applications: An International Journal
Associated evolution of a support vector machine-based classifier for pedestrian detection
Information Sciences: an International Journal
Model selection for the LS-SVM. Application to handwriting recognition
Pattern Recognition
Auto claim fraud detection using Bayesian learning neural networks
Expert Systems with Applications: An International Journal
Experimental evaluation of an automatic parameter setting system
Expert Systems with Applications: An International Journal
Window-based example selection in learning vector quantization
Neural Computation
Parameter screening and optimisation for ILP using designed experiments
ILP'09 Proceedings of the 19th international conference on Inductive logic programming
Parameter Screening and Optimisation for ILP using Designed Experiments
The Journal of Machine Learning Research
A neuro-fuzzy network to generate human-understandable knowledge from data
Cognitive Systems Research
Low-cost model selection for SVMs using local features
Engineering Applications of Artificial Intelligence
Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Predicting time series of railway speed restrictions with time-dependent machine learning techniques
Expert Systems with Applications: An International Journal
Quantifying the reliability of fault classifiers
Information Sciences: an International Journal
Sharpened graph ensemble for semi-supervised learning
Intelligent Data Analysis
Hi-index | 0.01 |
Many machine learning algorithms can be formulated as the minimization of a training criterion that involves a hyperparameter. This hyperparameter is usually chosen by trial and error with a model selection criterion. In this article we present a methodology to optimize several hyperparameters, based on the computation of the gradient of a model selection criterion with respect to the hyperparameters. In the case of a quadratic training criterion, the gradient of the selection criterion with respect to the hyperparameters is efficiently computed by backpropagating through a Cholesky decomposition. In the more general case, we show that the implicit function theorem can be used to derive a formula for the hyperparameter gradient involving second derivatives of the training criterion.