The Continuum-Armed Bandit Problem
SIAM Journal on Control and Optimization
The Nonstochastic Multiarmed Bandit Problem
SIAM Journal on Computing
Finite-time Analysis of the Multiarmed Bandit Problem
Machine Learning
Online convex optimization in the bandit setting: gradient descent without a gradient
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Robbing the bandit: less regret in online geometric optimization against an adaptive adversary
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Prediction, Learning, and Games
Prediction, Learning, and Games
Online decision problems with large strategy sets
Online decision problems with large strategy sets
Playing games with approximation algorithms
Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
Online linear optimization and adaptive routing
Journal of Computer and System Sciences
Improved rates for the stochastic continuum-armed bandit problem
COLT'07 Proceedings of the 20th annual conference on Learning theory
Integration of news content into web results
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Characterizing truthful multi-armed bandit mechanisms: extended abstract
Proceedings of the 10th ACM conference on Electronic commerce
Multi-armed Bandits with Metric Switching Costs
ICALP '09 Proceedings of the 36th Internatilonal Collogquium on Automata, Languages and Programming: Part II
Combining active learning and reactive control for robot grasping
Robotics and Autonomous Systems
Sharp dichotomies for regret minimization in metric spaces
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Online learning in adversarial Lipschitz environments
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Selective call out and real time bidding
WINE'10 Proceedings of the 6th international conference on Internet and network economics
Prior kowledge in larning fnite prameter saces
FG'09 Proceedings of the 14th international conference on Formal grammar
The Journal of Machine Learning Research
Lipschitz bandits without the Lipschitz constant
ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
Deviations of stochastic bandit regret
ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
Hierarchical Knowledge Gradient for Sequential Sampling
The Journal of Machine Learning Research
Dynamic pricing with limited supply
Proceedings of the 13th ACM Conference on Electronic Commerce
Toward a classification of finite partial-monitoring games
Theoretical Computer Science
A unified search federation system based on online user feedback
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Lower bounds and selectivity of weak-consistent policies in stochastic multi-armed bandit problem
The Journal of Machine Learning Research
Ranked bandits in metric spaces: learning diverse rankings over large document collections
The Journal of Machine Learning Research
Robustness of stochastic bandit policies
Theoretical Computer Science
Hi-index | 0.00 |
In a multi-armed bandit problem, an online algorithm chooses from a set of strategies in a sequence of $n$ trials so as to maximize the total payoff of the chosen strategies. While the performance of bandit algorithms with a small finite strategy set is quite well understood, bandit problems with large strategy sets are still a topic of very active investigation, motivated by practical applications such as online auctions and web advertisement. The goal of such research is to identify broad and natural classes of strategy sets and payoff functions which enable the design of efficient solutions. In this work we study a very general setting for the multi-armed bandit problem in which the strategies form a metric space, and the payoff function satisfies a Lipschitz condition with respect to the metric. We refer to this problem as the "Lipschitz MAB problem". We present a complete solution for the multi-armed problem in this setting. That is, for every metric space (L,X) we define an isometry invariant Max Min COV(X) which bounds from below the performance of Lipschitz MAB algorithms for $X$, and we present an algorithm which comes arbitrarily close to meeting this bound. Furthermore, our technique gives even better results for benign payoff functions.