Piecewise-stationary bandit problems with side observations

Authors:
Jia Yuan Yu;Shie Mannor
Affiliations:
McGill University, Montréal, Québec, Canada;McGill University, Montréal, Québec, Canada
Venue:
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Year:
2009

Citing 6
Cited 2

The weighted majority algorithm

Information and Computation
Tracking the Best Expert

Machine Learning - Special issue on context sensitivity and concept drift
The Nonstochastic Multiarmed Bandit Problem

SIAM Journal on Computing
Finite-time Analysis of the Multiarmed Bandit Problem

Machine Learning
Prediction, Learning, and Games

Prediction, Learning, and Games
Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems

The Journal of Machine Learning Research

Near-optimal Regret Bounds for Reinforcement Learning

The Journal of Machine Learning Research
On upper-confidence bound policies for switching bandit problems

ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider a sequential decision problem where the rewards are generated by a piecewise-stationary distribution. However, the different reward distributions are unknown and may change at unknown instants. Our approach uses a limited number of side observations on past rewards, but does not require prior knowledge of the frequency of changes. In spite of the adversarial nature of the reward process, we provide an algorithm whose regret, with respect to the baseline with perfect knowledge of the distributions and the changes, is O(k log(T)), where k is the number of changes up to time T. This is in contrast to the case where side observations are not available, and where the regret is at least Ω(√T).