Tracking the best linear predictor

Authors:
Mark Herbster;Manfred K. Warmuth
Affiliations:
Department of Computer Science,University College London, Gower Street,London, WC1E 6BT, UK;Department of Computer Science, University of California at Santa Cruz, Baskin School of Engineering, Santa Cruz, CA
Venue:
The Journal of Machine Learning Research
Year:
2001

Citing 27
Cited 29

Mistake bounds and logarithmic linear-threshold learning algorithms

Mistake bounds and logarithmic linear-threshold learning algorithms
Aggregating strategies

COLT '90 Proceedings of the third annual workshop on Computational learning theory
A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
The weighted majority algorithm

Information and Computation
A game of prediction with expert advice

COLT '95 Proceedings of the eighth annual conference on Computational learning theory
Exponentiated gradient versus gradient descent for linear predictors

Information and Computation
A Comparison of New and Old Algorithms for a Mixture EstimationProblem

Machine Learning - Special issue on the eighth annual conference on computational learning theory, (COLT '95)
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Derandomizing stochastic prediction strategies

COLT '97 Proceedings of the tenth annual conference on Computational learning theory
The binary exponentiated gradient algorithm for learning linear functions

COLT '97 Proceedings of the tenth annual conference on Computational learning theory
The Perceptron algorithm versus Winnow: linear versus logarithmic mistake bounds when few input variables are relevant

Artificial Intelligence - Special issue on relevance
Tracking the best regressor

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Tracking the Best Disjunction

Machine Learning - Special issue on context sensitivity and concept drift
Tracking the Best Expert

Machine Learning - Special issue on context sensitivity and concept drift
The robustness of the p-norm algorithms

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
On-line Learning and the Metrical Task System Problem

Machine Learning
Adaptive disk spin—down for mobile computers

Mobile Networks and Applications
Relative Loss Bounds for Multidimensional Regression Problems

Machine Learning
General Convergence Results for Linear Discriminant Updates

Machine Learning
Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions

Machine Learning
Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm

Machine Learning
Exploring applications of learning theory to pattern matching and dynamic adjustment of tcp acknowledgement delays

Exploring applications of learning theory to pattern matching and dynamic adjustment of tcp acknowledgement delays
On-line algorithms for combining language models

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Switching portfolios

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Sequential prediction of individual sequences under general loss functions

IEEE Transactions on Information Theory
Worst-case quadratic loss bounds for prediction using linear functions and gradient descent

IEEE Transactions on Neural Networks
Relative loss bounds for single neurons

IEEE Transactions on Neural Networks

Large Margin Classification for Moving Targets

ALT '02 Proceedings of the 13th International Conference on Algorithmic Learning Theory
Tracking Linear-Threshold Concepts with Winnow

COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
Online learning of linear classifiers

Advanced lectures on machine learning
Tracking a small set of experts by mixing past posteriors

The Journal of Machine Learning Research
Tracking linear-threshold concepts with Winnow

The Journal of Machine Learning Research
Totally corrective boosting algorithms that maximize the margin

ICML '06 Proceedings of the 23rd international conference on Machine learning
Evolutionary clustering

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Worst-Case Analysis of Selective Sampling for Linear Classification

The Journal of Machine Learning Research
Online kernel PCA with entropic matrix updates

Proceedings of the 24th international conference on Machine learning
Winnowing subspaces

Proceedings of the 24th international conference on Machine learning
Diverse committees vote for dependable profits

Proceedings of the 9th annual conference on Genetic and evolutionary computation
Tracking the best hyperplane with a simple budget Perceptron

Machine Learning
Leading strategies in competitive on-line prediction

Theoretical Computer Science
Online Regression Competitive with Changing Predictors

ALT '07 Proceedings of the 18th international conference on Algorithmic Learning Theory
The uniform hardcore lemma via approximate Bregman projections

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Learning Permutations with Exponential Weights

The Journal of Machine Learning Research
Learning permutations with exponential weights

COLT'07 Proceedings of the 20th annual conference on Learning theory
Universal randomized switching

IEEE Transactions on Signal Processing
An identity for kernel ridge regression

ALT'10 Proceedings of the 21st international conference on Algorithmic learning theory
Re-adapting the regularization of weights for non-stationary regression

ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
Leading strategies in competitive on-line prediction

ALT'06 Proceedings of the 17th international conference on Algorithmic Learning Theory
Tracking the best hyperplane with a simple budget perceptron

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Tracking the best of many experts

COLT'05 Proceedings of the 18th annual conference on Learning Theory
Tracking the best level set in a level-crossing analog-to-digital converter

Digital Signal Processing
New analysis and algorithm for learning with drifting distributions

ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory
Weighted last-step min-max algorithm with improved sub-logarithmic regret

ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory
Kernelization of matrix updates, when and how?

ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory
An identity for kernel ridge regression

Theoretical Computer Science
Online learning with multiple kernels: A review

Neural Computation

Quantified Score

Hi-index	0.01

Visualization

Abstract

In most on-line learning research the total on-line loss of the algorithm is compared to the total loss of the best off-line predictor u from a comparison class of predictors. We call such bounds static bounds. The interesting feature of these bounds is that they hold for an arbitrary sequence of examples. Recently some work has been done where the predictor ut at each trial t is allowed to change with time, and the total on-line loss of the algorithm is compared to the sum of the losses of ut at each trial plus the total "cost" for shifting to successive predictors. This is to model situations in which the examples change over time, and different predictors from the comparison class are best for different segments of the sequence of examples. We call such bounds shifting bounds. They hold for arbitrary sequences of examples and arbitrary sequences of predictors.Naturally shifting bounds are much harder to prove. The only known bounds are for the case when the comparison class consists of a sequences of experts or boolean disjunctions. In this paper we develop the methodology for lifting known static bounds to the shifting case. In particular we obtain bounds when the comparison class consists of linear neurons (linear combinations of experts). Our essential technique is to project the hypothesis of the static algorithm at the end of each trial into a suitably chosen convex region. This keeps the hypothesis of the algorithm well-behaved and the static bounds can be converted to shifting bounds.