Tracking a small set of experts by mixing past posteriors

Authors:
Olivier Bousquet;Manfred K. Warmuth
Affiliations:
Centre de Mathématiques Appliquées, Ecole Polytechnique, 91128 Palaiseau, France;Computer Science Department, University of California, Santa Cruz, Santa Cruz, CA
Venue:
The Journal of Machine Learning Research
Year:
2003

Citing 16
Cited 15

Aggregating strategies

COLT '90 Proceedings of the third annual workshop on Computational learning theory
The weighted majority algorithm

Information and Computation
Exponentiated gradient versus gradient descent for linear predictors

Information and Computation
How to use expert advice

Journal of the ACM (JACM)
The binary exponentiated gradient algorithm for learning linear functions

COLT '97 Proceedings of the tenth annual conference on Computational learning theory
A game of prediction with expert advice

Journal of Computer and System Sciences - Special issue on the eighth annual workshop on computational learning theory, July 5–8, 1995
Tracking the Best Disjunction

Machine Learning - Special issue on context sensitivity and concept drift
Tracking the Best Expert

Machine Learning - Special issue on context sensitivity and concept drift
Derandomizing Stochastic Prediction Strategies

Machine Learning - Special issue: computational learning theory, COLT '97
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Adaptive and Self-Confident On-Line Learning Algorithms

COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Tracking the best linear predictor

The Journal of Machine Learning Research
Switching portfolios

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Coding for a binary independent piecewise-identically-distributed source

IEEE Transactions on Information Theory - Part 2
Sequential prediction of individual sequences under general loss functions

IEEE Transactions on Information Theory
Low-complexity sequential lossless coding for piecewise-stationary memoryless sources

IEEE Transactions on Information Theory

Using additive expert ensembles to cope with concept drift

ICML '05 Proceedings of the 22nd international conference on Machine learning
Online kernel PCA with entropic matrix updates

Proceedings of the 24th international conference on Machine learning
Real-time ranking with concept drift using expert advice

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts

The Journal of Machine Learning Research
Efficient learning algorithms for changing environments

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Discrete denoising with shifts

IEEE Transactions on Information Theory
Learning Permutations with Exponential Weights

The Journal of Machine Learning Research
Learning permutations with exponential weights

COLT'07 Proceedings of the 20th annual conference on Learning theory
Multitask learning with expert advice

COLT'07 Proceedings of the 20th annual conference on Learning theory
Combining initial segments of lists

ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
The shortest path problem under partial monitoring

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Online variance minimization

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Tracking the best of many experts

COLT'05 Proceedings of the 18th annual conference on Learning Theory
A closer look at adaptive regret

ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory
Combining initial segments of lists

Theoretical Computer Science

Quantified Score

Hi-index	0.06

Visualization

Abstract

In this paper, we examine on-line learning problems in which the target concept is allowed to change over time. In each trial a master algorithm receives predictions from a large set of n experts. Its goal is to predict almost as well as the best sequence of such experts chosen off-line by partitioning the training sequence into k+1 sections and then choosing the best expert for each section. We build on methods developed by Herbster and Warmuth and consider an open problem posed by Freund where the experts in the best partition are from a small pool of size m. Since k m, the best expert shifts back and forth between the experts of the small pool. We propose algorithms that solve this open problem by mixing the past posteriors maintained by the master algorithm. We relate the number of bits needed for encoding the best partition to the loss bounds of the algorithms. Instead of paying log n for choosing the best expert in each section we first pay log (n choose m) bits in the bounds for identifying the pool of m experts and then log m bits per new section. In the bounds we also pay twice for encoding the boundaries of the sections.