TIGHT WORST-CASE LOSS BOUNDS FOR PREDICTING WITH EXPERT ADVICE

Authors:
David Haussler;Jyrki Kivinen;Manfred Warmuth
Affiliations:
-;-;-
Venue:
TIGHT WORST-CASE LOSS BOUNDS FOR PREDICTING WITH EXPERT ADVICE
Year:
1994

Citing 0
Cited 13

Additive versus exponentiated gradient updates for linear prediction

STOC '95 Proceedings of the twenty-seventh annual ACM symposium on Theory of computing
General bounds on the mutual information between a parameter and n conditionally independent observations

COLT '95 Proceedings of the eighth annual conference on Computational learning theory
A dynamic disk spin-down technique for mobile computing

MobiCom '96 Proceedings of the 2nd annual international conference on Mobile computing and networking
Method combination for document filtering

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Universal portfolios with and without transaction costs

COLT '97 Proceedings of the tenth annual conference on Computational learning theory
Linear relations between square-loss and Kolmogorov complexity

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Universal Portfolios With and Without Transaction Costs

Machine Learning - Special issue: computational learning theory, COLT '97
Derandomizing Stochastic Prediction Strategies

Machine Learning - Special issue: computational learning theory, COLT '97
Predicting nearly as well as the best pruning of a decision tree through dynamic programming scheme

Theoretical Computer Science
General linear relations between different types of predictive complexity

Theoretical Computer Science
Suboptimal measures of predictive complexity for absolute loss function

Information and Computation
On complexity of easy predictable sequences

Information and Computation
Genral Linear Relations among Different Types of Predictive Complexity

ALT '99 Proceedings of the 10th International Conference on Algorithmic Learning Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider on-line algorithms for predicting binary or continuous-valued outcomes, when the algorithm has available the predictions made by N experts. For a sequence of trials, we compute total losses for both the algorithm and the experts under a loss function. At the end of the trial sequence, we compare the total loss of the algorithm to the total loss of the best expert, i.e., the expert with the least loss on the particular trial sequence. We show that for a large class of loss functions, with binary outcomes the total loss of the algorithm proposed by Vovk exceeds the total loss of the best expert at most by the amount c ln N, where c is a constant determined by the loss function. This upper bound does not depend on any assumptions on how the experts'' predictions or the outcomes are generated, and the trial sequence can be arbitrarily long. We give a straightforward method for finding the correct value c and show by a lower bound that for this value of c, the upper bound is asymptotically tight. The lower bound is based on a probabilistic adversary argument. The class of loss functions for which the c ln N upper bound holds includes the square loss, the logarithmic loss, and the Hellinger loss. We also consider another class of loss functions, including the absolute loss, for which we have an Omega((l log N)^(1/2)) lower bound, where l is the number of trials. We show that for the square and logarithmic loss functions, Vovk''s algorithm achieves the same worst-case upper bounds with continuous-valued outcomes as with binary outcomes. For the absolute loss, we show how bounds earlier achieved for binary outcomes can be achieved with continuous-valued outcomes using a slightly more complicated algorithm.