The ratio index for budgeted learning, with applications

Authors:
Ashish Goel;Sanjeev Khanna;Brad Null
Affiliations:
Stanford University;University of Pennsylvania, Philadelphia PA;Stanford University
Venue:
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Year:
2009

Citing 13
Cited 6

Conservation laws, extended polymatroids and multiarmed bandit problems; a polyhedral approach to indexable systems

Mathematics of Operations Research
Allocating Bandwidth for Bursty Connections

SIAM Journal on Computing
Restless Bandits, Linear Programming Relaxations, and a Primal-Dual Index Heuristic

Operations Research
Stochastic Load Balancing and Related Problems

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Approximating the Stochastic Knapsack Problem: The Benefit of Adaptivity

FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
Stochastic Optimization is (Almost) as easy as Deterministic Optimization

FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
Active model selection

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Adaptivity and approximation for stochastic packing problems

SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
An adaptive algorithm for selecting profitable keywords for search-based advertising services

EC '06 Proceedings of the 7th ACM conference on Electronic commerce
Asking the right questions: model-driven optimization using probes

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Approximation algorithms for budgeted learning problems

Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
Model-driven optimization using adaptive probes

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Sampling bounds for stochastic optimization

APPROX'05/RANDOM'05 Proceedings of the 8th international workshop on Approximation, Randomization and Combinatorial Optimization Problems, and Proceedings of the 9th international conference on Randamization and Computation: algorithms and techniques

Multi-armed Bandits with Metric Switching Costs

ICALP '09 Proceedings of the 36th Internatilonal Collogquium on Automata, Languages and Programming: Part II
Paradoxes in Learning and the Marginal Value of Information

Decision Analysis
Matroid prophet inequalities

STOC '12 Proceedings of the forty-fourth annual ACM symposium on Theory of computing
Dynamic pricing with limited supply

Proceedings of the 13th ACM Conference on Electronic Commerce
The Knowledge Gradient Algorithm for a General Class of Online Learning Problems

Operations Research
New algorithms for budgeted learning

Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the budgeted learning problem, we are allowed to experiment on a set of alternatives (given a fixed experimentation budget) with the goal of picking a single alternative with the largest possible expected payoff. Constant factor approximation algorithms for this problem were developed by Guha and Munagala by rounding a linear program that couples the various alternatives together. In this paper we present an index for this problem, which we call the ratio index, which also guarantees a constant factor approximation. Index-based policies have the advantage that a single number (i.e. the index) can be computed for each alternative irrespective of all other alternatives, and the alternative with the highest index is experimented upon. This is analogous to the famous Gittins index for the discounted multi-armed bandit problem. The ratio index has several interesting structural properties. First, we show that it can be computed in strongly polynomial time. Second, we show that with the appropriate discount factor, the Gittins index and our ratio index are constant factor approximations of each other, and hence the Gittins index also gives a constant factor approximation to the budgeted learning problem. Finally, we show that the ratio index can be used to create an index-based policy that achieves an O(1)-approximation for the finite horizon version of the multi-armed bandit problem. Moreover, the policy does not require any knowledge of the horizon (whereas we compare its performance against an optimal strategy that is aware of the horizon). This yields the following surprising result: there is an index-based policy that achieves an O(1)-approximation for the multi-armed bandit problem, oblivious to the underlying discount factor.