The Continuum-Armed Bandit Problem

Authors:
Rajeev Agrawal
Affiliations:
-
Venue:
SIAM Journal on Control and Optimization
Year:
1995

Citing 0
Cited 14

An adaptive algorithm for selecting profitable keywords for search-based advertising services

EC '06 Proceedings of the 7th ACM conference on Electronic commerce
Online linear optimization and adaptive routing

Journal of Computer and System Sciences
Multi-armed bandits in metric spaces

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
The max K-armed bandit: a new model of exploration applied to search heuristic selection

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
Improved rates for the stochastic continuum-armed bandit problem

COLT'07 Proceedings of the 20th annual conference on Learning theory
Combining active learning and reactive control for robot grasping

Robotics and Autonomous Systems
Sharp dichotomies for regret minimization in metric spaces

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
X-Armed Bandits

The Journal of Machine Learning Research
Lipschitz bandits without the Lipschitz constant

ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
Dynamic pricing with limited supply

Proceedings of the 13th ACM Conference on Electronic Commerce
Dynamic Pricing Under a General Parametric Choice Model

Operations Research
Ranked bandits in metric spaces: learning diverse rankings over large document collections

The Journal of Machine Learning Research
Online learning for auction mechanism in bandit setting

Decision Support Systems
Optimal learning for sequential sampling with non-parametric beliefs

Journal of Global Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we consider the multiarmed bandit problem where the arms are chosen from a subset of the real line and the mean rewards are assumed to be a continuous function of the arms. The problem with an infinite number of arms is much more difficult than the usual one with a finite number of arms because the built-in learning task is now infinite dimensional. We devise a kernel estimator-based learning scheme for the mean reward as a function of the arms. Using this learning scheme, we construct a class of certainty equivalence control with forcing schemes and derive asymptotic upper bounds on their learning loss. To the best of our knowledge, these bounds are the strongest rates yet available. Moreover, they are stronger than the $o(n)$ required for optimality with respect to the average-cost-per-unit-time criterion.