Approximation algorithms for budgeted learning problems

  • Authors:
  • Sudipto Guha;Kamesh Munagala

  • Affiliations:
  • University of Pennsylvania, Philadelphia, PA;Duke University, Durham, NC

  • Venue:
  • Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present the first approximation algorithms for a large class of budgeted learning problems. One classicexample of the above is the budgeted multi-armed bandit problem. In this problem each arm of the bandithas an unknown reward distribution on which a prior isspecified as input. The knowledge about the underlying distribution can be refined in the exploration phase by playing the arm and observing the rewards. However, there is a budget on the total number of plays allowed during exploration. After this exploration phase,the arm with the highest (posterior) expected reward is hosen for exploitation. The goal is to design the adaptive exploration phase subject to a budget constraint on the number of plays, in order to maximize the expected reward of the arm chosen for exploitation. While this problem is reasonably well understood in the infinite horizon discounted reward setting, the budgeted version of the problem is NP-Hard. For this problem and several generalizations, we provide approximate policies that achieve a reward within constant factor of the reward optimal policy. Our algorithms use a novel linear program rounding technique based on stochastic packing.