Thompson sampling: an asymptotically optimal finite-time analysis

  • Authors:
  • Emilie Kaufmann;Nathaniel Korda;Rémi Munos

  • Affiliations:
  • Telecom Paristech UMR CNRS 5141, France;INRIA Lille-Nord Europe, France;INRIA Lille-Nord Europe, France

  • Venue:
  • ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The question of the optimality of Thompson Sampling for solving the stochastic multi-armed bandit problem had been open since 1933. In this paper we answer it positively for the case of Bernoulli rewards by providing the first finite-time analysis that matches the asymptotic rate given in the Lai and Robbins lower bound for the cumulative regret. The proof is accompanied by a numerical comparison with other optimal policies, experiments that have been lacking in the literature until now for the Bernoulli case.