A PAC-bayes bound for tailored density estimation

Authors:
Matthew Higgs;John Shawe-Taylor
Affiliations:
Center for Computational Statistics and Machine Learning, University College London;Center for Computational Statistics and Machine Learning, University College London
Venue:
ALT'10 Proceedings of the 21st international conference on Algorithmic learning theory
Year:
2010

Citing 9
Cited 1

Elements of information theory

Elements of information theory
PAC-Bayesian model averaging

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Pac-bayesian generalisation error bounds for gaussian process classification

The Journal of Machine Learning Research
Tutorial on Practical Prediction Theory for Classification

The Journal of Machine Learning Research
Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity

Machine Learning
Tailoring density estimation via reproducing kernel moment matching

Proceedings of the 25th international conference on Machine learning
A Hilbert Space Embedding for Distributions

ALT '07 Proceedings of the 18th international conference on Algorithmic Learning Theory
PAC-Bayesian learning of linear classifiers

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Information-theoretic upper and lower bounds for statistical estimation

IEEE Transactions on Information Theory

PAC-Bayesian Analysis of Co-clustering and Beyond

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we construct a general method for reporting on the accuracy of density estimation. Using variational methods from statistical learning theory we derive a PAC, algorithm-dependent bound on the distance between the data generating distribution and a learned approximation. The distance measure takes the role of a loss function that can be tailored to the learning problem, enabling us to control discrepancies on tasks relevant to subsequent inference. We apply the bound to an efficient mixture learning algorithm. Using the method of localisation we encode properties of both the algorithm and the data generating distribution, producing a tight, empirical, algorithm-dependent upper risk bound on the performance of the learner. We discuss other uses of the bound for arbitrary distributions and model averaging.