Lower bounds and selectivity of weak-consistent policies in stochastic multi-armed bandit problem

Authors:
Antoine Salomon;Jean-Yves Audiber;Issam El Alaoui
Affiliations:
Imagine, Université Paris-Est, Champs-sur-Marne, France;Imagine, Université Paris-Est, Champs-sur-Marne, France and CNRS, ENS, INRIA, UMR;Imagine, Université Paris-Est, Champs-sur-Marne, France
Venue:
The Journal of Machine Learning Research
Year:
2013

Citing 7
Cited 0

Regular Article: Optimal Adaptive Policies for Sequential Allocation Problems

Advances in Applied Mathematics
How to use expert advice

Journal of the ACM (JACM)
The Nonstochastic Multiarmed Bandit Problem

SIAM Journal on Computing
Finite-time Analysis of the Multiarmed Bandit Problem

Machine Learning
Prediction, Learning, and Games

Prediction, Learning, and Games
Multi-armed bandits in metric spaces

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Exploration-exploitation tradeoff using variance estimates in multi-armed bandits

Theoretical Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit. A well-known result of Lai and Robbins, which has then been extended by Burnetas and Katehakis, has established the presence of a logarithmic bound for all consistent policies. We relax the notion of consistency, and exhibit a generalisation of the bound. We also study the existence of logarithmic bounds in general and in the case of Hannan consistency. Moreover, we prove that it is impossible to design an adaptive policy that would select the best of two algorithms by taking advantage of the properties of the environment. To get these results, we study variants of popular Upper Confidence Bounds (UCB) policies.