Consistency of discrete Bayesian learning

Authors:
Jan Poland
Affiliations:
ABB Switzerland Ltd. Corporate Research, Segelhof, CH-5405 Baden-Dättwil, Switzerland
Venue:
Theoretical Computer Science
Year:
2008

Citing 16
Cited 0

Aggregating strategies

COLT '90 Proceedings of the third annual workshop on Computational learning theory
An introduction to Kolmogorov complexity and its applications (2nd ed.)

An introduction to Kolmogorov complexity and its applications (2nd ed.)
Selective Sampling Using the Query by Committee Algorithm

Machine Learning
Potential-Based Algorithms in On-Line Prediction and Game Theory

Machine Learning
Universal Artificial Intelligence: Sequential Decisions Based On Algorithmic Probability

Universal Artificial Intelligence: Sequential Decisions Based On Algorithmic Probability
Adaptive Online Prediction by Following the Perturbed Leader

The Journal of Machine Learning Research
PAC-Bayes risk bounds for sample-compressed Gibbs classifiers

ICML '05 Proceedings of the 22nd international conference on Machine learning
MDL convergence speed for Bernoulli sequences

Statistics and Computing
Prediction, Learning, and Games

Prediction, Learning, and Games
The weighted majority algorithm

SFCS '89 Proceedings of the 30th Annual Symposium on Foundations of Computer Science
Fisher information and stochastic complexity

IEEE Transactions on Information Theory
The minimum description length principle in coding and modeling

IEEE Transactions on Information Theory
Information-theoretic asymptotics of Bayes methods

IEEE Transactions on Information Theory
Complexity-based induction systems: Comparisons and convergence theorems

IEEE Transactions on Information Theory
Minimum complexity density estimation

IEEE Transactions on Information Theory
Asymptotics of discrete MDL for online prediction

IEEE Transactions on Information Theory

Quantified Score

Hi-index	5.23

Visualization

Abstract

Bayes' rule specifies how to obtain a posterior from a class of hypotheses endowed with a prior and the observed data. There are three fundamental ways to use this posterior for predicting the future: marginalization (integration over the hypotheses w.r.t. the posterior), MAP (taking the a posteriori most probable hypothesis), and stochastic model selection (selecting a hypothesis at random according to the posterior distribution). If the hypothesis class is countable, and contains the data generating distribution (this is termed the ''realizable case''), strong consistency theorems are known for the former two methods in a sequential prediction framework, asserting almost sure convergence of the predictions to the truth as well as loss bounds. We prove corresponding results for stochastic model selection, for both discrete and continuous observation spaces. As a main technical tool, we will use the concept of a potential: this quantity, which is always positive, measures the total possible amount of future prediction errors. Precisely, in each time step, the expected potential decrease upper bounds the expected error. We introduce the entropy potential of a hypothesis class as its worst-case entropy, with regard to the true distribution. Our results are proven within a general stochastic online prediction framework, that comprises both online classification and prediction of non-i.i.d. sequences.