Sequential prediction of individual sequences under general loss functions

Authors:
D. Haussler;J. Kivinen;M. K. Warmuth
Affiliations:
Dept. of Comput. Eng., California Univ., Santa Cruz, CA;-;-
Venue:
IEEE Transactions on Information Theory
Year:
2006

Citing 0
Cited 25

Tracking the Best Expert

Machine Learning - Special issue on context sensitivity and concept drift
Minimax regret under log loss for general classes of experts

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
On prediction of individual sequences relative to a set of experts in the presence of noise

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Averaging Expert Predictions

EuroCOLT '99 Proceedings of the 4th European Conference on Computational Learning Theory
Tracking a Small Set of Experts by Mixing Past Posteriors

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Potential-Based Algorithms in Online Prediction and Game Theory

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Discrete Prediction Games with Arbitrary Feedback and Loss

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Learning Additive Models Online with Fast Evaluating Kernels

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Mixability and the Existence of Weak Complexities

COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
Tracking the best linear predictor

The Journal of Machine Learning Research
Tracking a small set of experts by mixing past posteriors

The Journal of Machine Learning Research
Optimality of universal Bayesian sequence prediction for general loss and alphabet

The Journal of Machine Learning Research
Superior Guarantees for Sequential Prediction and Lossless Compression via Alphabet Decomposition

The Journal of Machine Learning Research
Prediction With Expert Advice For The Brier Game

The Journal of Machine Learning Research
Prediction with expert evaluators' advice

ALT'09 Proceedings of the 20th international conference on Algorithmic learning theory
Online learning with queries

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Prediction with expert advice under discounted loss

ALT'10 Proceedings of the 21st international conference on Algorithmic learning theory
Relative loss bounds for on-line density estimation with the exponential family of distributions

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Mixture of vector experts

ALT'05 Proceedings of the 16th international conference on Algorithmic Learning Theory
A randomized online learning algorithm for better variance control

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Continuous experts and the binning algorithm

COLT'06 Proceedings of the 19th annual conference on Learning Theory
On-Line regression competitive with reproducing kernel hilbert spaces

TAMC'06 Proceedings of the Third international conference on Theory and Applications of Models of Computation
The weak aggregating algorithm and weak mixability

COLT'05 Proceedings of the 18th annual conference on Learning Theory
Sparse regression learning by aggregation and Langevin Monte-Carlo

Journal of Computer and System Sciences
Mixability is bayes risk curvature relative to log loss

The Journal of Machine Learning Research

Quantified Score

Hi-index	754.84

Visualization

Abstract

We consider adaptive sequential prediction of arbitrary binary sequences when the performance is evaluated using a general loss function. The goal is to predict on each individual sequence nearly as well as the best prediction strategy in a given comparison class of (possibly adaptive) prediction strategies, called experts. By using a general loss function, we generalize previous work on universal prediction, forecasting, and data compression. However, here we restrict ourselves to the case when the comparison class is finite. For a given sequence, we define the regret as the total loss on the entire sequence suffered by the adaptive sequential predictor, minus the total loss suffered by the predictor in the comparison class that performs best on that particular sequence. We show that for a large class of loss functions, the minimax regret is either θ(log N) or Ω(√Llog N), depending on the loss function, where N is the number of predictors in the comparison class andL is the length of the sequence to be predicted. The former case was shown previously by Vovk (1990); we give a simplified analysis with an explicit closed form for the constant in the minimax regret formula, and give a probabilistic argument that shows this constant is the best possible. Some weak regularity conditions are imposed on the loss function in obtaining these results. We also extend our analysis to the case of predicting arbitrary sequences that take real values in the interval [0,1]