Multi-armed Bandits with Metric Switching Costs

Authors:
Sudipto Guha;Kamesh Munagala
Affiliations:
Department of Computer and Information Sciences, University of Pennsylvania, Philadelphia 19104-6389;Department of Computer Science, Duke University, Durham 27708-0129
Venue:
ICALP '09 Proceedings of the 36th Internatilonal Collogquium on Automata, Languages and Programming: Part II
Year:
2009

Citing 16
Cited 7

How to use expert advice

Journal of the ACM (JACM)
Dynamic Programming and Optimal Control

Dynamic Programming and Optimal Control
Approximation algorithms for deadline-TSP and vehicle routing with time-windows

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Active model selection

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Online convex optimization in the bandit setting: gradient descent without a gradient

SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Approximate Data Collection in Sensor Networks using Probabilistic Models

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Asking the right questions: model-driven optimization using probes

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Energy-efficient monitoring of extreme values in sensor networks

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Model-driven data acquisition in sensor networks

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Approximation Algorithms for Orienteering and Discounted-Reward TSP

SIAM Journal on Computing
Improved algorithms for orienteering and related problems

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Multi-armed bandits in metric spaces

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
The ratio index for budgeted learning, with applications

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Approximation algorithms for restless bandit problems

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Multi-armed Bandits with Metric Switching Costs

ICALP '09 Proceedings of the 36th Internatilonal Collogquium on Automata, Languages and Programming: Part II
Model-Driven dynamic control of embedded wireless sensor networks

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part III

Multi-armed Bandits with Metric Switching Costs

ICALP '09 Proceedings of the 36th Internatilonal Collogquium on Automata, Languages and Programming: Part II
Approximation algorithms for optimal decision trees and adaptive TSP problems

ICALP'10 Proceedings of the 37th international colloquium conference on Automata, languages and programming
When LP is the cure for your matching woes: improved bounds for stochastic matchings

ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part II
Approximation algorithms for stochastic orienteering

Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Adaptive submodularity: theory and applications in active learning and stochastic optimization

Journal of Artificial Intelligence Research
Matroid prophet inequalities

STOC '12 Proceedings of the forty-fourth annual ACM symposium on Theory of computing
Online optimization with switching cost

ACM SIGMETRICS Performance Evaluation Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we consider the stochastic multi-armed bandit with metric switching costs. Given a set of locations (arms) in a metric space and prior information about the reward available at these locations, cost of getting a sample/play at every location and rules to update the prior based on samples/plays, the task is to maximize a certain objective function constrained to a distance cost of L and cost of plays C . This fundamental and well-studied problem models several optimization problems in robot navigation, sensor networks, labor economics, etc. In this paper we develop a general duality-based framework to provide the first O (1) approximation for metric switching costs; the actual constants being quite small. Since these problems are Max-SNP hard, this result is the best possible. The overall technique and the ensuing structural results are independently of interest in the context of bandit problems with complicated side-constraints. Our techniques also improve the approximation ratio of the budgeted learning problem from 4 to 3 + *** .