Multi-armed Bandits with Metric Switching Costs

  • Authors:
  • Sudipto Guha;Kamesh Munagala

  • Affiliations:
  • Department of Computer and Information Sciences, University of Pennsylvania, Philadelphia 19104-6389;Department of Computer Science, Duke University, Durham 27708-0129

  • Venue:
  • ICALP '09 Proceedings of the 36th Internatilonal Collogquium on Automata, Languages and Programming: Part II
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we consider the stochastic multi-armed bandit with metric switching costs. Given a set of locations (arms) in a metric space and prior information about the reward available at these locations, cost of getting a sample/play at every location and rules to update the prior based on samples/plays, the task is to maximize a certain objective function constrained to a distance cost of L and cost of plays C . This fundamental and well-studied problem models several optimization problems in robot navigation, sensor networks, labor economics, etc. In this paper we develop a general duality-based framework to provide the first O (1) approximation for metric switching costs; the actual constants being quite small. Since these problems are Max-SNP hard, this result is the best possible. The overall technique and the ensuing structural results are independently of interest in the context of bandit problems with complicated side-constraints. Our techniques also improve the approximation ratio of the budgeted learning problem from 4 to 3 + *** .