Regularization with a pruning prior
Neural Networks
Variational Relevance Vector Machines
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Adaptive Sparseness for Supervised Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence
Sparse bayesian learning and the relevance vector machine
The Journal of Machine Learning Research
Information Theory, Inference & Learning Algorithms
Information Theory, Inference & Learning Algorithms
Predictive automatic relevance determination by expectation propagation
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Analysis of multiresolution image denoising schemes using generalized Gaussian and complexity priors
IEEE Transactions on Information Theory
IEEE Transactions on Neural Networks
Laplace maximum margin Markov networks
Proceedings of the 25th international conference on Machine learning
StatSnowball: a statistical approach to extracting entity relationships
Proceedings of the 18th international conference on World wide web
Primal sparse Max-margin Markov networks
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Maximum Entropy Discrimination Markov Networks
The Journal of Machine Learning Research
Probit classifiers with a generalized Gaussian scale mixture prior
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Probabilistic classifiers with a generalized Gaussian scale mixture prior
Pattern Recognition
Hi-index | 0.10 |
We present a new classification approach, using a variational Bayesian estimation of probit regression with Laplace priors. Laplace priors have been previously used extensively as a sparsity-inducing mechanism to perform feature selection simultaneously with classification or regression. However, contrarily to the 'myth' of sparse Bayesian learning with Laplace priors, we find that the sparsity effect is due to a property of the maximum a posteriori (MAP) parameter estimates only. The Bayesian estimates, in turn, induce a posterior weighting rather than a hard selection of features, and has different advantageous properties: (1) It provides better estimates of the prediction uncertainty; (2) it is able to retain correlated features favouring generalisation; (3) it is more stable with respect to the hyperparameter choice and (4) it produces a weight-based ranking of the features, suited for interpretation. We analyse the behaviour of the Bayesian estimate in comparison with its MAP counterpart, as well as other related models, (a) through a graphical interpretation of the associated shrinkage and (b) by controlled numerical simulations in a range of testing conditions. The results pinpoint the situations when the advantages of Bayesian estimates are feasible to exploit. Finally, we demonstrate the working of our method in a gene expression classification task.