Improved rates for the stochastic continuum-armed bandit problem

Authors:
Peter Auer;Ronald Ortner;Csaba Szepesvári
Affiliations:
University of Leoben, Leoben, Austria;University of Leoben, Leoben, Austria;University of Alberta, Edmonton, Canada
Venue:
COLT'07 Proceedings of the 20th annual conference on Learning theory
Year:
2007

Citing 3
Cited 13

The Continuum-Armed Bandit Problem

SIAM Journal on Control and Optimization
Finite-time Analysis of the Multiarmed Bandit Problem

Machine Learning
Minimizing regret with label efficient prediction

IEEE Transactions on Information Theory

Multi-armed bandits in metric spaces

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Online Regret Bounds for Markov Decision Processes with Deterministic Transitions

ALT '08 Proceedings of the 19th international conference on Algorithmic Learning Theory
Algorithms and Bounds for Rollout Sampling Approximate Policy Iteration

Recent Advances in Reinforcement Learning
Experiments with Adaptive Transfer Rate in Reinforcement Learning

Knowledge Acquisition: Approaches, Algorithms and Applications
Online regret bounds for Markov decision processes with deterministic transitions

Theoretical Computer Science
Combining active learning and reactive control for robot grasping

Robotics and Autonomous Systems
Sharp dichotomies for regret minimization in metric spaces

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
X-Armed Bandits

The Journal of Machine Learning Research
Lipschitz bandits without the Lipschitz constant

ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
Dynamic pricing with limited supply

Proceedings of the 13th ACM Conference on Electronic Commerce
Dynamic Pricing Under a General Parametric Choice Model

Operations Research
Truthful incentives in crowdsourcing tasks using regret minimization mechanisms

Proceedings of the 22nd international conference on World Wide Web
Ranked bandits in metric spaces: learning diverse rankings over large document collections

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Considering one-dimensional continuum-armed bandit problems, we propose an improvement of an algorithm of Kleinberg and a new set of conditions which give rise to improved rates. In particular, we introduce a novel assumption that is complementary to the previous smoothness conditions, while at the same time smoothness of the mean payoff function is required only at the maxima. Under these new assumptions new bounds on the expected regret are derived. In particular, we show that apart from logarithmic factors, the expected regret scales with the square-root of the number of trials, provided that the mean payoff function has finitely many maxima and its second derivatives are continuous and non-vanishing at the maxima. This improves a previous result of Cope by weakening the assumptions on the function. We also derive matching lower bounds. To complement the bounds on the expected regret, we provide high probability bounds which exhibit similar scaling.