Elements of information theory
Elements of information theory
The weighted majority algorithm
Information and Computation
Journal of Computer and System Sciences
A game of prediction with expert advice
COLT '95 Proceedings of the eighth annual conference on Computational learning theory
Exponentiated gradient versus gradient descent for linear predictors
Information and Computation
Journal of the ACM (JACM)
Using and combining predictors that specialize
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
A decision-theoretic generalization of on-line learning and an application to boosting
Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Boosting as entropy projection
COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Large Margin Classification Using the Perceptron Algorithm
Machine Learning - The Eleventh Annual Conference on computational Learning Theory
PAC-Bayesian Stochastic Model Selection
Machine Learning
Pac-bayesian generalisation error bounds for gaussian process classification
The Journal of Machine Learning Research
A general minimax result for relative entropy
IEEE Transactions on Information Theory
PAC-Bayesian learning of linear classifiers
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Covariance in Unsupervised Learning of Probabilistic Grammars
The Journal of Machine Learning Research
PAC-Bayesian Analysis of Co-clustering and Beyond
The Journal of Machine Learning Research
Meta optimization and its application to portfolio selection
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Variational multinomial logit gaussian process
The Journal of Machine Learning Research
Hi-index | 0.00 |
We show that several important Bayesian bounds studied in machine learning, both in the batch as well as the online setting, arise by an application of a simple compression lemma. In particular, we derive (i) PAC-Bayesian bounds in the batch setting, (ii) Bayesian log-loss bounds and (iii) Bayesian bounded-loss bounds in the online setting using the compression lemma. Although every setting has different semantics for prior, posterior and loss, we show that the core bound argument is the same. The paper simplifies our understanding of several important and apparently disparate results, as well as brings to light a powerful tool for developing similar arguments for other methods.