Preventing Over-Fitting during Model Selection via Bayesian Regularisation of the Hyper-Parameters

Authors:
Gavin C. Cawley;Nicola L. C. Talbot
Affiliations:
-;-
Venue:
The Journal of Machine Learning Research
Year:
2007

Citing 26
Cited 17

Neural networks and the bias/variance dilemma

Neural Computation
Bayesian interpolation

Neural Computation
A practical Bayesian framework for backpropagation networks

Neural Computation
Bayesian regularization and pruning using a Laplace prior

Neural Computation
Support-Vector Networks

Machine Learning
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
Bayesian Classification With Gaussian Processes

IEEE Transactions on Pattern Analysis and Machine Intelligence
LAPACK Users' guide (third ed.)

LAPACK Users' guide (third ed.)
Least Squares Support Vector Machine Classifiers

Neural Processing Letters
Soft Margins for AdaBoost

Machine Learning
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms

Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Choosing Multiple Parameters for Support Vector Machines

Machine Learning
Expectation Propagation for approximate Bayesian inference

UAI '01 Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence
Sparse bayesian learning and the relevance vector machine

The Journal of Machine Learning Research
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
Predictive automatic relevance determination by expectation propagation

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Optimally Regularised Kernel Fisher Discriminant Analysis

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 2 - Volume 02
Fast exact leave-one-out cross-validation of sparse least-squares support vector machines

Neural Networks
A Fast Dual Algorithm for Kernel Logistic Regression

Machine Learning
Feature Scaling for Kernel Fisher Discriminant Analysis Using Leave-One-Out Cross Validation

Neural Computation
Predictive Approaches for Choosing Hyperparameters in Gaussian Processes

Neural Computation
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)

Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Efficient approximate leave-one-out cross-validation for kernel logistic regression

Machine Learning
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

Optimally regularised kernel Fisher discriminant classification

Neural Networks
2008 Special Issue: Low rank updated LS-SVM classifiers for fast variable selection

Neural Networks
2008 Special Issue: Analysis of the IJCNN 2007 agnostic learning vs. prior knowledge challenge

Neural Networks
Efficient approximate leave-one-out cross-validation for kernel logistic regression

Machine Learning
Adaptive spherical Gaussian kernel in sparse Bayesian learning framework for nonlinear regression

Expert Systems with Applications: An International Journal
Model selection for the LS-SVM. Application to handwriting recognition

Pattern Recognition
Model Selection: Beyond the Bayesian/Frequentist Divide

The Journal of Machine Learning Research
On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation

The Journal of Machine Learning Research
Model selection for least squares support vector regressions based on small-world strategy

Expert Systems with Applications: An International Journal
Design of a multiple kernel learning algorithm for LS-SVM by convex programming

Neural Networks
Quadratically constrained maximum a posteriori estimation for binary classifier

MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
Parameter Estimation Using Metaheuristics in Systems Biology: A Comprehensive Review

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Nyström approximate model selection for LSSVM

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Label-Noise robust logistic regression and its applications

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Eigenvalues perturbation of integral operator for kernel selection

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Variance sensitivity analysis of parameters for pruning of a multilayer perceptron: application to a sawmill supply chain simulation model

Advances in Artificial Neural Systems
A fast and robust model selection algorithm for multi-input multi-output support vector machine

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

While the model parameters of a kernel machine are typically given by the solution of a convex optimisation problem, with a single global optimum, the selection of good values for the regularisation and kernel parameters is much less straightforward. Fortunately the leave-one-out cross-validation procedure can be performed or a least approximated very efficiently in closed form for a wide variety of kernel learning methods, providing a convenient means for model selection. Leave-one-out cross-validation based estimates of performance, however, generally exhibit a relatively high variance and are therefore prone to over-fitting. In this paper, we investigate the novel use of Bayesian regularisation at the second level of inference, adding a regularisation term to the model selection criterion corresponding to a prior over the hyper-parameter values, where the additional regularisation parameters are integrated out analytically. Results obtained on a suite of thirteen real-world and synthetic benchmark data sets clearly demonstrate the benefit of this approach.