Tracking the Best Expert

  • Authors:
  • Mark Herbster;Manfred K. Warmuth

  • Affiliations:
  • Department of Computer Science, University of California at Santa Cruz, Applied Sciences Building, Santa Cruz, CA 95064. E-mail: (mark|manfred)@cs.ucsc.edu;Department of Computer Science, University of California at Santa Cruz, Applied Sciences Building, Santa Cruz, CA 95064. E-mail: (mark|manfred)@cs.ucsc.edu

  • Venue:
  • Machine Learning - Special issue on context sensitivity and concept drift
  • Year:
  • 1998

Quantified Score

Hi-index 0.06

Visualization

Abstract

We generalize the recent relative loss bounds for on-line algorithms where the additional loss of the algorithm on the whole sequence of examples over the loss of the best expert is bounded. The generalization allows the sequence to be partitioned into segments, and the goal is tobound the additional loss of the algorithm over the sum of the losses ofthe best experts for each segment. This is to model situations in which theexamples change and different experts are best for certain segments of thesequence of examples. In the single segment case, the additional loss isproportional to log n, where n is the number ofexperts and the constant of proportionality depends on the loss function.Our algorithms do not produce the best partition; however the loss boundshows that our predictions are close to those of the best partition. Whenthe number of segments is k+1 and the sequence is of lengthℓ, we can bound the additional loss of our algorithm over the bestpartition by O(k \log n+k \log(ℓ/k)). For thecase when the loss per trial is bounded by one, we obtain an algorithmwhose additional loss over the loss of the best partition is independent ofthe length of the sequence. The additional loss becomes O(k\log n+ k\log(L/k)), where L is the loss of the bestpartitionwith k+1 segments. Our algorithms for tracking thepredictions of the best expert aresimple adaptations of Vovk'soriginal algorithm for the single best expert case. As in the originalalgorithms, we keep one weight per expert, and spend O(1) timeper weight in each trial.