Predictive automatic relevance determination by expectation propagation

Authors:
Yuan (Alan) Qi;Thomas P. Minka;Rosalind W. Picard;Zoubin Ghahramani
Affiliations:
MIT Media Laboratory, Cambridge, MA;Microsoft Research, Cambridge, UK;MIT Media Laboratory, Cambridge, MA;Gatsby Computational Neuroscience Unit, London, UK
Venue:
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Year:
2004

Citing 6
Cited 19

Bayesian interpolation

Neural Computation
Soft Margins for AdaBoost

Machine Learning
Bayesian Learning for Neural Networks

Bayesian Learning for Neural Networks
Expectation Propagation for approximate Bayesian inference

UAI '01 Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence
An introduction to variable and feature selection

The Journal of Machine Learning Research
Gaussian Processes for Classification: Mean-Field Algorithms

Neural Computation

Sparse Multinomial Logistic Regression: Fast Algorithms and Generalization Bounds

IEEE Transactions on Pattern Analysis and Machine Intelligence
Variational Bayesian multinomial probit regression with Gaussian process priors

Neural Computation
On Bayesian classification with Laplace priors

Pattern Recognition Letters
Preventing Over-Fitting during Model Selection via Bayesian Regularisation of the Hyper-Parameters

The Journal of Machine Learning Research
Algorithms for Sparse Linear Classifiers in the Massive Data Setting

The Journal of Machine Learning Research
Bayesian Inference and Optimal Design for the Sparse Linear Model

The Journal of Machine Learning Research
Cross-Validation Optimization for Large Scale Structured Classification Kernel Methods

The Journal of Machine Learning Research
Incremental GRLVQ: Learning relevant features for 3D object recognition

Neurocomputing
Bayesian Inference for Sparse Generalized Linear Models

ECML '07 Proceedings of the 18th European conference on Machine Learning
A novel hierarchical Bayesian HMM for multi-dimensional discrete data

AIA '08 Proceedings of the 26th IASTED International Conference on Artificial Intelligence and Applications
Extended linear models with Gaussian prior on the parameters and adaptive expansion vectors

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation

The Journal of Machine Learning Research
Selecting useful features for personal credit risk analysis

International Journal of Business Information Systems
Multi-class sparse Bayesian regression for neuroimaging data analysis

MLMI'10 Proceedings of the First international conference on Machine learning in medical imaging
Sparse bayesian learning for identifying imaging biomarkers in AD prediction

MICCAI'10 Proceedings of the 13th international conference on Medical image computing and computer-assisted intervention: Part III
Multiclass sparse Bayesian regression for fMRI-based prediction

Journal of Biomedical Imaging - Special issue on Machine Learning in Medical Imaging
Single-frame image recovery using a Pearson type VII MRF

Neurocomputing
Probabilistic classifiers with a generalized Gaussian scale mixture prior

Pattern Recognition
Nested expectation propagation for Gaussian process classification

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many real-world classification problems the input contains a large number of potentially irrelevant features. This paper proposes a new Bayesian framework for determining the relevance of input features. This approach extends one of the most successful Bayesian methods for feature selection and sparse learning, known as Automatic Relevance Determination (ARD). ARD finds the relevance of features by optimizing the model marginal likelihood, also known as the evidence. We show that this can lead to overfitting. To address this problem, we propose Predictive ARD based on estimating the predictive performance of the classifier. While the actual leave-one-out predictive performance is generally very costly to compute, the expectation propagation (EP) algorithm proposed by Minka provides an estimate of this predictive performance as a side-effect of its iterations. We exploit this in our algorithm to do feature selection, and to select data points in a sparse Bayesian kernel classifier. Moreover, we provide two other improvements to previous algorithms, by replacing Laplace's approximation with the generally more accurate EP, and by incorporating the fast optimization algorithm proposed by Faul and Tipping. Our experiments show that our method based on the EP estimate of predictive performance is more accurate on test data than relevance determination by optimizing the evidence.