Re-adapting the regularization of weights for non-stationary regression

Authors:
Nina Vaits;Koby Crammer
Affiliations:
Department of Electrical Engneering, The Technion, Haifa, Israel;Department of Electrical Engneering, The Technion, Haifa, Israel
Venue:
ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
Year:
2011

Citing 10
Cited 2

Aggregating strategies

COLT '90 Proceedings of the third annual workshop on Computational learning theory
The weighted majority algorithm

Information and Computation
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
Exponentiated gradient versus gradient descent for linear predictors

Information and Computation
On Relative Loss Bounds in Generalized Linear Regression

FCT '99 Proceedings of the 12th International Symposium on Fundamentals of Computation Theory
Tracking the best linear predictor

The Journal of Machine Learning Research
A Second-Order Perceptron Algorithm

SIAM Journal on Computing
Prediction, Learning, and Games

Prediction, Learning, and Games
Tracking the best hyperplane with a simple budget Perceptron

Machine Learning
Confidence-weighted linear classification

Proceedings of the 25th international conference on Machine learning

Weighted last-step min-max algorithm with improved sub-logarithmic regret

ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory
Adaptive regularization of weight vectors

Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

The goal of a learner in standard online learning is to have the cumulative loss not much larger compared with the best-performing prediction-function from some fixed class. Numerous algorithms were shown to have this gap arbitrarily close to zero compared with the best function that is chosen off-line. Nevertheless, many real-world applications (such as adaptive filtering) are non-stationary in nature and the best prediction function may not be fixed but drift over time. We introduce a new algorithm for regression that uses per-feature-learning rate and provide a regret bound with respect to the best sequence of functions with drift. We show that as long as the cumulative drift is sub-linear in the length of the sequence our algorithm suffers a regret that is sub-linear as well. We also sketch an algorithm that achieves the best of the two worlds: in the stationary settings has log (T) regret, while in the non-stationary settings has sub-linear regret. Simulations demonstrate the usefulness of our algorithm compared with other state-of-the-art approaches.