Multi-armed bandits in metric spaces

Authors:
Robert Kleinberg;Aleksandrs Slivkins;Eli Upfal
Affiliations:
Cornell University, Ithaca, NY, USA;Microsoft, Mountain View, CA, USA;Brown University, Providence, RI, USA
Venue:
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Year:
2008

Citing 10
Cited 18

The Continuum-Armed Bandit Problem

SIAM Journal on Control and Optimization
The Nonstochastic Multiarmed Bandit Problem

SIAM Journal on Computing
Finite-time Analysis of the Multiarmed Bandit Problem

Machine Learning
Online convex optimization in the bandit setting: gradient descent without a gradient

SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Robbing the bandit: less regret in online geometric optimization against an adaptive adversary

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Prediction, Learning, and Games

Prediction, Learning, and Games
Online decision problems with large strategy sets

Online decision problems with large strategy sets
Playing games with approximation algorithms

Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
Online linear optimization and adaptive routing

Journal of Computer and System Sciences
Improved rates for the stochastic continuum-armed bandit problem

COLT'07 Proceedings of the 20th annual conference on Learning theory

Integration of news content into web results

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Characterizing truthful multi-armed bandit mechanisms: extended abstract

Proceedings of the 10th ACM conference on Electronic commerce
Multi-armed Bandits with Metric Switching Costs

ICALP '09 Proceedings of the 36th Internatilonal Collogquium on Automata, Languages and Programming: Part II
Combining active learning and reactive control for robot grasping

Robotics and Autonomous Systems
Sharp dichotomies for regret minimization in metric spaces

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Online learning in adversarial Lipschitz environments

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Selective call out and real time bidding

WINE'10 Proceedings of the 6th international conference on Internet and network economics
Prior kowledge in larning fnite prameter saces

FG'09 Proceedings of the 14th international conference on Formal grammar
X-Armed Bandits

The Journal of Machine Learning Research
Lipschitz bandits without the Lipschitz constant

ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
Deviations of stochastic bandit regret

ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
Hierarchical Knowledge Gradient for Sequential Sampling

The Journal of Machine Learning Research
Dynamic pricing with limited supply

Proceedings of the 13th ACM Conference on Electronic Commerce
Toward a classification of finite partial-monitoring games

Theoretical Computer Science
A unified search federation system based on online user feedback

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Lower bounds and selectivity of weak-consistent policies in stochastic multi-armed bandit problem

The Journal of Machine Learning Research
Ranked bandits in metric spaces: learning diverse rankings over large document collections

The Journal of Machine Learning Research
Robustness of stochastic bandit policies

Theoretical Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

In a multi-armed bandit problem, an online algorithm chooses from a set of strategies in a sequence of $n$ trials so as to maximize the total payoff of the chosen strategies. While the performance of bandit algorithms with a small finite strategy set is quite well understood, bandit problems with large strategy sets are still a topic of very active investigation, motivated by practical applications such as online auctions and web advertisement. The goal of such research is to identify broad and natural classes of strategy sets and payoff functions which enable the design of efficient solutions. In this work we study a very general setting for the multi-armed bandit problem in which the strategies form a metric space, and the payoff function satisfies a Lipschitz condition with respect to the metric. We refer to this problem as the "Lipschitz MAB problem". We present a complete solution for the multi-armed problem in this setting. That is, for every metric space (L,X) we define an isometry invariant Max Min COV(X) which bounds from below the performance of Lipschitz MAB algorithms for $X$, and we present an algorithm which comes arbitrarily close to meeting this bound. Furthermore, our technique gives even better results for benign payoff functions.