Worst-Case Bounds for the Logarithmic Loss of Predictors

Authors:
Nicolò Cesa-Bianchi;Gábor Lugosi
Affiliations:
Department of Information Technologies, University of Milan, Via Bramante 65, 26013 Crema, Italy. cesabian@dsi.unimi.it;Department of Economics, Pompeu Fabra University, Ramon Trias Fargas 25-27, 08005 Barcelona, Spain. lugosi@upf.es
Venue:
Machine Learning
Year:
2001

Citing 10
Cited 4

Aggregating strategies

COLT '90 Proceedings of the third annual workshop on Computational learning theory
Learning probabilistic prediction functions

COLT '88 Proceedings of the first annual workshop on Computational learning theory
Elements of information theory

Elements of information theory
A loss bound model for on-line stochastic prediction algorithms

Information and Computation
Predicting a binary sequence almost as well as the optimal biased coin

COLT '96 Proceedings of the ninth annual conference on Computational learning theory
A game of prediction with expert advice

Journal of Computer and System Sciences - Special issue on the eighth annual workshop on computational learning theory, July 5–8, 1995
Fisher information and stochastic complexity

IEEE Transactions on Information Theory
Universal portfolios with side information

IEEE Transactions on Information Theory
A decision-theoretic extension of stochastic complexity and its applications to learning

IEEE Transactions on Information Theory
Universal prediction

IEEE Transactions on Information Theory

Bounds on Sample Size for Policy Evaluation in Markov Environments

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Optimality of universal Bayesian sequence prediction for general loss and alphabet

The Journal of Machine Learning Research
Predictability, Complexity, and Learning

Neural Computation
Asymptotic log-loss of prequential maximum likelihood codes

COLT'05 Proceedings of the 18th annual conference on Learning Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

We investigate on-line prediction of individual sequences. Given a class of predictors, the goal is to predict as well as the best predictor in the class, where the loss is measured by the self information (logarithmic) loss function. The excess loss (regret) is closely related to the redundancy of the associated lossless universal code. Using Shtarkov's theorem and tools from empirical process theory, we prove a general upper bound on the best possible (minimax) regret. The bound depends on certain metric properties of the class of predictors. We apply the bound to both parametric and nonparametric classes of predictors. Finally, we point out a suboptimal behavior of the popular Bayesian weighted average algorithm.