An adaptive algorithm for selecting profitable keywords for search-based advertising services
EC '06 Proceedings of the 7th ACM conference on Electronic commerce
Online linear optimization and adaptive routing
Journal of Computer and System Sciences
Multi-armed bandits in metric spaces
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
The max K-armed bandit: a new model of exploration applied to search heuristic selection
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
Improved rates for the stochastic continuum-armed bandit problem
COLT'07 Proceedings of the 20th annual conference on Learning theory
Combining active learning and reactive control for robot grasping
Robotics and Autonomous Systems
Sharp dichotomies for regret minimization in metric spaces
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
The Journal of Machine Learning Research
Lipschitz bandits without the Lipschitz constant
ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
Dynamic pricing with limited supply
Proceedings of the 13th ACM Conference on Electronic Commerce
Dynamic Pricing Under a General Parametric Choice Model
Operations Research
Ranked bandits in metric spaces: learning diverse rankings over large document collections
The Journal of Machine Learning Research
Online learning for auction mechanism in bandit setting
Decision Support Systems
Optimal learning for sequential sampling with non-parametric beliefs
Journal of Global Optimization
Hi-index | 0.00 |
In this paper we consider the multiarmed bandit problem where the arms are chosen from a subset of the real line and the mean rewards are assumed to be a continuous function of the arms. The problem with an infinite number of arms is much more difficult than the usual one with a finite number of arms because the built-in learning task is now infinite dimensional. We devise a kernel estimator-based learning scheme for the mean reward as a function of the arms. Using this learning scheme, we construct a class of certainty equivalence control with forcing schemes and derive asymptotic upper bounds on their learning loss. To the best of our knowledge, these bounds are the strongest rates yet available. Moreover, they are stronger than the $o(n)$ required for optimality with respect to the average-cost-per-unit-time criterion.