Optimality of universal Bayesian sequence prediction for general loss and alphabet
The Journal of Machine Learning Research
MDL convergence speed for Bernoulli sequences
Statistics and Computing
Algorithmic complexity bounds on future prediction errors
Information and Computation
Sequential predictions based on algorithmic complexity
Journal of Computer and System Sciences
The missing consistency theorem for bayesian learning: stochastic model selection
ALT'06 Proceedings of the 17th international conference on Algorithmic Learning Theory
Monotone conditional complexity bounds on future prediction errors
ALT'05 Proceedings of the 16th international conference on Algorithmic Learning Theory
Hi-index | 754.84 |
The probability of observing xt at time t, given past observations x1...xt-1 can be computed if the true generating distribution μ of the sequences x1x2x3... is known. If μ is unknown, but known to belong to a class ℳ one can base one's prediction on the Bayes mix ξ defined as a weighted sum of distributions ν ∈ ℳ. Various convergence results of the mixture posterior ξt to the true posterior μt are presented. In particular, a new (elementary) derivation of the convergence ξt/μt → 1 is provided, which additionally gives the rate of convergence. A general sequence predictor is allowed to choose an action yt based on x1...xt-1 and receives loss ℓx(t)y(t) if xt is the next symbol of the sequence. No assumptions are made on the structure of ℓ (apart from being bounded) and ℳ. The Bayes-optimal prediction scheme Λξ based on mixture ξ and the Bayes-optimal informed prediction scheme Λμ are defined and the total loss Lξ of Λξ is bounded in terms of the total loss Lμ of Λμ. It is shown that Lξ is bounded for bounded Lμ and Lξ/Lμ → 1 for Lμ → ∞. Convergence of the instantaneous losses is also proven.