Optimality of universal Bayesian sequence prediction for general loss and alphabet

Authors:
Marcus Hutter
Affiliations:
IDSIA, Galleria 2, 6928 Manno-Lugano, Switzerland
Venue:
The Journal of Machine Learning Research
Year:
2003

Citing 24
Cited 10

Randomness conservation inequalities; information and independence in mathematical theories

Information and Control
Elements of information theory

Elements of information theory
Inductive reasoning and Kolmogorov complexity

Journal of Computer and System Sciences
Universal forecasting algorithms

Information and Computation
The weighted majority algorithm

Information and Computation
How to use expert advice

Journal of the ACM (JACM)
An introduction to Kolmogorov complexity and its applications (2nd ed.)

An introduction to Kolmogorov complexity and its applications (2nd ed.)
The discovery of algorithmic probability

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Inductive Inference: Theory and Methods

ACM Computing Surveys (CSUR)
New error bounds for Solomonoff prediction

Journal of Computer and System Sciences
Stochastic Complexity in Statistical Inquiry Theory

Stochastic Complexity in Statistical Inquiry Theory
Worst-Case Bounds for the Logarithmic Loss of Predictors

Machine Learning
Adaptive and Self-Confident On-Line Learning Algorithms

COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Self-Optimizing and Pareto-Optimal Policies in General Environments Based on Bayes-Mixtures

COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
The Speed Prior: A New Simplicity Measure Yielding Near-Optimal Computable Predictions

COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
No free lunch theorems for optimization

IEEE Transactions on Evolutionary Computation
Fisher information and stochastic complexity

IEEE Transactions on Information Theory
A decision-theoretic extension of stochastic complexity and its applications to learning

IEEE Transactions on Information Theory
Sequential prediction of individual sequences under general loss functions

IEEE Transactions on Information Theory
Universal prediction

IEEE Transactions on Information Theory
Minimum description length induction, Bayesianism, and Kolmogorov complexity

IEEE Transactions on Information Theory
Algorithmic statistics

IEEE Transactions on Information Theory
Convergence and loss bounds for Bayesian sequence prediction

IEEE Transactions on Information Theory

MDL convergence speed for Bernoulli sequences

Statistics and Computing
Algorithmic complexity bounds on future prediction errors

Information and Computation
On the possibility of learning in reactive environments with arbitrary dependence

Theoretical Computer Science
On Universal Transfer Learning

ALT '07 Proceedings of the 18th international conference on Algorithmic Learning Theory
On universal transfer learning

Theoretical Computer Science
M-AID: An adaptive middleware built upon anomaly detectors for intrusion detection and rational response

ACM Transactions on Autonomous and Adaptive Systems (TAAS)
A minimum relative entropy principle for learning and acting

Journal of Artificial Intelligence Research
Asymptotic learnability of reinforcement problems with arbitrary dependence

ALT'06 Proceedings of the 17th international conference on Algorithmic Learning Theory
Monotone conditional complexity bounds on future prediction errors

ALT'05 Proceedings of the 16th international conference on Algorithmic Learning Theory
A brief observation-centric analysis on anomaly-based intrusion detection

ISPEC'05 Proceedings of the First international conference on Information Security Practice and Experience

Quantified Score

Hi-index	0.00

Visualization

Abstract

Various optimality properties of universal sequence predictors based on Bayes-mixtures in general, and Solomonoff's prediction scheme in particular, will be studied.The probability of observing xt at time t, given past observations x1...xt-1 can be computed with the chain rule if the true generating distribution μ of the sequences x1x2x3.... is known. If μ is unknown, but known to belong to a countable or continuous class Μ one can base ones prediction on the Bayes-mixture ξ defined as a wν-weighted sum or integral of distributions ν ∈ Μ. The cumulative expected loss of the Bayes-optimal universal prediction scheme based on ξ is shown to be close to the loss of the Bayes-optimal, but infeasible prediction scheme based on μ. We show that the bounds are tight and that no other predictor can lead to significantly smaller bounds.Furthermore, for various performance measures, we show Pareto-optimality of ξ and give an Occam's razor argument that the choice wν &sim 2-K(ν) for the weights is optimal, where K(ν) is the length of the shortest program describing ν.The results are applied to games of chance, defined as a sequence of bets, observations, and rewards.The prediction schemes (and bounds) are compared to the popular predictors based on expert advice.Extensions to infinite alphabets, partial, delayed and probabilistic prediction, classification, and more active systems are briefly discussed.