COLT '90 Proceedings of the third annual workshop on Computational learning theory
Learning probabilistic prediction functions
COLT '88 Proceedings of the first annual workshop on Computational learning theory
Elements of information theory
Elements of information theory
A loss bound model for on-line stochastic prediction algorithms
Information and Computation
Predicting a binary sequence almost as well as the optimal biased coin
COLT '96 Proceedings of the ninth annual conference on Computational learning theory
A game of prediction with expert advice
Journal of Computer and System Sciences - Special issue on the eighth annual workshop on computational learning theory, July 5–8, 1995
Fisher information and stochastic complexity
IEEE Transactions on Information Theory
Universal portfolios with side information
IEEE Transactions on Information Theory
A decision-theoretic extension of stochastic complexity and its applications to learning
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory
Bounds on Sample Size for Policy Evaluation in Markov Environments
COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Optimality of universal Bayesian sequence prediction for general loss and alphabet
The Journal of Machine Learning Research
Predictability, Complexity, and Learning
Neural Computation
Asymptotic log-loss of prequential maximum likelihood codes
COLT'05 Proceedings of the 18th annual conference on Learning Theory
Hi-index | 0.00 |
We investigate on-line prediction of individual sequences. Given a class of predictors, the goal is to predict as well as the best predictor in the class, where the loss is measured by the self information (logarithmic) loss function. The excess loss (regret) is closely related to the redundancy of the associated lossless universal code. Using Shtarkov's theorem and tools from empirical process theory, we prove a general upper bound on the best possible (minimax) regret. The bound depends on certain metric properties of the class of predictors. We apply the bound to both parametric and nonparametric classes of predictors. Finally, we point out a suboptimal behavior of the popular Bayesian weighted average algorithm.