Online learning in adversarial Lipschitz environments

Authors:
Odalric-Ambrym Maillard;Rémi Munos
Affiliations:
INRIA Lille - Nord Europe, France;INRIA Lille - Nord Europe, France
Venue:
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Year:
2010

Citing 13
Cited 1

The weighted majority algorithm

Information and Computation
How to use expert advice

Journal of the ACM (JACM)
The Nonstochastic Multiarmed Bandit Problem

SIAM Journal on Computing
Adaptive and Self-Confident On-Line Learning Algorithms

COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Gambling in a rigged casino: The adversarial multi-armed bandit problem

FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
Using confidence bounds for exploitation-exploration trade-offs

The Journal of Machine Learning Research
Online convex optimization in the bandit setting: gradient descent without a gradient

SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Prediction, Learning, and Games

Prediction, Learning, and Games
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Nonstochastic bandits: Countable decision set, unbounded costs and reactive environments

Theoretical Computer Science
Multi-armed bandits in metric spaces

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Efficient bandit algorithms for online multiclass prediction

Proceedings of the 25th international conference on Machine learning
Logarithmic regret algorithms for online convex optimization

COLT'06 Proceedings of the 19th annual conference on Learning Theory

Ranked bandits in metric spaces: learning diverse rankings over large document collections

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of online learning in an adversarial environment when the reward functions chosen by the adversary are assumed to be Lipschitz. This setting extends previous works on linear and convex online learning. We provide a class of algorithms with cumulative regret upper bounded by Õ(√dt ln(λ)) where d is the dimension of the search space, T the time horizon, and λ the Lipschitz constant. Efficient numerical implementations using particle methods are discussed. Applications include online supervised learning problems for both full and partial (bandit) information settings, for a large class of non-linear regressors/classifiers, such as neural networks.