Additive versus exponentiated gradient updates for linear prediction
STOC '95 Proceedings of the twenty-seventh annual ACM symposium on Theory of computing
COLT '95 Proceedings of the eighth annual conference on Computational learning theory
A dynamic disk spin-down technique for mobile computing
MobiCom '96 Proceedings of the 2nd annual international conference on Mobile computing and networking
Method combination for document filtering
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Universal portfolios with and without transaction costs
COLT '97 Proceedings of the tenth annual conference on Computational learning theory
Linear relations between square-loss and Kolmogorov complexity
COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Universal Portfolios With and Without Transaction Costs
Machine Learning - Special issue: computational learning theory, COLT '97
Derandomizing Stochastic Prediction Strategies
Machine Learning - Special issue: computational learning theory, COLT '97
Predicting nearly as well as the best pruning of a decision tree through dynamic programming scheme
Theoretical Computer Science
General linear relations between different types of predictive complexity
Theoretical Computer Science
Suboptimal measures of predictive complexity for absolute loss function
Information and Computation
On complexity of easy predictable sequences
Information and Computation
Genral Linear Relations among Different Types of Predictive Complexity
ALT '99 Proceedings of the 10th International Conference on Algorithmic Learning Theory
Hi-index | 0.00 |
We consider on-line algorithms for predicting binary or continuous-valued outcomes, when the algorithm has available the predictions made by N experts. For a sequence of trials, we compute total losses for both the algorithm and the experts under a loss function. At the end of the trial sequence, we compare the total loss of the algorithm to the total loss of the best expert, i.e., the expert with the least loss on the particular trial sequence. We show that for a large class of loss functions, with binary outcomes the total loss of the algorithm proposed by Vovk exceeds the total loss of the best expert at most by the amount c ln N, where c is a constant determined by the loss function. This upper bound does not depend on any assumptions on how the experts'' predictions or the outcomes are generated, and the trial sequence can be arbitrarily long. We give a straightforward method for finding the correct value c and show by a lower bound that for this value of c, the upper bound is asymptotically tight. The lower bound is based on a probabilistic adversary argument. The class of loss functions for which the c ln N upper bound holds includes the square loss, the logarithmic loss, and the Hellinger loss. We also consider another class of loss functions, including the absolute loss, for which we have an Omega((l log N)^(1/2)) lower bound, where l is the number of trials. We show that for the square and logarithmic loss functions, Vovk''s algorithm achieves the same worst-case upper bounds with continuous-valued outcomes as with binary outcomes. For the absolute loss, we show how bounds earlier achieved for binary outcomes can be achieved with continuous-valued outcomes using a slightly more complicated algorithm.