Elements of information theory
Elements of information theory
The Continuum-Armed Bandit Problem
SIAM Journal on Control and Optimization
Journal of the ACM (JACM)
A game of prediction with expert advice
Journal of Computer and System Sciences - Special issue on the eighth annual workshop on computational learning theory, July 5–8, 1995
The Nonstochastic Multiarmed Bandit Problem
SIAM Journal on Computing
Finite-time Analysis of the Multiarmed Bandit Problem
Machine Learning
Using confidence bounds for exploitation-exploration trade-offs
The Journal of Machine Learning Research
Online convex optimization in the bandit setting: gradient descent without a gradient
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Robbing the bandit: less regret in online geometric optimization against an adaptive adversary
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Prediction, Learning, and Games
Prediction, Learning, and Games
Online decision problems with large strategy sets
Online decision problems with large strategy sets
Playing games with approximation algorithms
Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
Online linear optimization and adaptive routing
Journal of Computer and System Sciences
Approximation Algorithms for Partial-Information Based Stochastic Control with Markovian Rewards
FOCS '07 Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science
Multi-armed bandits in metric spaces
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Approximation algorithms for restless bandit problems
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Better algorithms for benign bandits
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Improved rates for the stochastic continuum-armed bandit problem
COLT'07 Proceedings of the 20th annual conference on Learning theory
Online learning with prior knowledge
COLT'07 Proceedings of the 20th annual conference on Learning theory
Pure exploration in finitely-armed and continuous-armed bandits
Theoretical Computer Science
Convergence Rates of Efficient Global Optimization Algorithms
The Journal of Machine Learning Research
Ranked bandits in metric spaces: learning diverse rankings over large document collections
The Journal of Machine Learning Research
Hi-index | 0.00 |
The Lipschitz multi-armed bandit (MAB) problem generalizes the classical multi-armed bandit problem by assuming one is given side information consisting of a priori upper bounds on the difference in expected payoff between certain pairs of strategies. Classical results of Lai-Robbins [28] and Auer et al. [3] imply a logarithmic regret bound for the Lipschitz MAB problem on finite metric spaces. Recent results on continuum-armed bandit problems and their generalizations imply lower bounds of √t, or stronger, for many infinite metric spaces such as the unit interval. Is this dichotomy universal? We prove that the answer is yes: for every metric space, the optimal regret of a Lipschitz MAB algorithm is either bounded above by any f ε w(log t), or bounded below by any g ∈ o(√t). Perhaps surprisingly, this dichotomy does not coincide with the distinction between finite and infinite metric spaces; instead it depends on whether the completion of the metric space is compact and countable. Our proof connects upper and lower bound techniques in online learning with classical topological notions such as perfect sets and the Cantor-Bendixson theorem. We also consider the full-feedback (a.k.a., best-expert) version of Lipschitz MAB problem, termed the Lipschitz experts problem, and show that this problem exhibits a similar dichotomy. We proceed to give nearly matching upper and lower bounds on regret in the Lipschitz experts problem on uncountable metric spaces. These bounds are of the form Θ(tγ), where the exponent γ ε [1/2, 1] depends on the metric space. To characterize this dependence, we introduce a novel dimensionality notion tailored to the experts problem. Finally, we show that both Lipschitz bandits and Lipschitz experts problems become completely intractable (in the sense that no algorithm has regret o(t)) if and only if the completion of the metric space is non-compact.