Online convex optimization in the bandit setting: gradient descent without a gradient

Authors:
Abraham D. Flaxman;Adam Tauman Kalai;H. Brendan McMahan
Affiliations:
Carnegie Mellon University;Toyota Technical Institute at Chicago;Carnegie Mellon University
Venue:
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Year:
2005

Citing 12
Cited 36

Exponentiated gradient versus gradient descent for linear predictors

Information and Computation
A one-measurement form of simultaneous perturbation stochastic approximation

Automatica (Journal of IFAC)
Random walks and an O*(n5) volume algorithm for convex bodies

Random Structures & Algorithms
Solving convex programs by random walks

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Randomized Algorithms for Stochastic Approximation under Arbitrary Disturbances

Automation and Remote Control
Path Kernels and Multiplicative Updates

COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
Introduction to Stochastic Search and Optimization

Introduction to Stochastic Search and Optimization
Gambling in a rigged casino: The adversarial multi-armed bandit problem

FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
Efficient algorithms for universal portfolios

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Simulated Annealing in Convex Bodies and an 0*(n4) Volume Algorithm

FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Adaptive routing with end-to-end feedback: distributed learning and geometric approaches

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing

Self-improving algorithms

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Anytime algorithms for multi-armed bandit problems

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
2005: an offline persepctive

ACM SIGACT News
Perspectives on multiagent learning

Artificial Intelligence
Generalized multiagent learning with performance bound

Autonomous Agents and Multi-Agent Systems
Online linear optimization and adaptive routing

Journal of Computer and System Sciences
Agnostically learning decision trees

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Multi-armed bandits in metric spaces

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Efficient bandit algorithms for online multiclass prediction

Proceedings of the 25th international conference on Machine learning
Approximation algorithms for restless bandit problems

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Better algorithms for benign bandits

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
A Nonparametric Asymptotic Analysis of Inventory Planning with Censored Demand

Mathematics of Operations Research
An Adaptive Algorithm for Finding the Optimal Base-Stock Policy in Lost Sales Inventory Systems with Censored Demand

Mathematics of Operations Research
A simpler unified analysis of budget perceptrons

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Interactively optimizing information retrieval systems as a dueling bandits problem

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Large-scale uncertainty management systems: learning and exploiting your data

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Multi-armed Bandits with Metric Switching Costs

ICALP '09 Proceedings of the 36th Internatilonal Collogquium on Automata, Languages and Programming: Part II
Emerging motor behaviors: Learning joint coordination in articulated mobile robots

Neurocomputing
Efficient no-regret multiagent learning

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 1
Emerging behaviors by learning joint coordination in articulated mobile robots

IWANN'07 Proceedings of the 9th international work conference on Artificial neural networks
Online learning with prior knowledge

COLT'07 Proceedings of the 20th annual conference on Learning theory
Approximation algorithms for restless bandit problems

Journal of the ACM (JACM)
Sharp dichotomies for regret minimization in metric spaces

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Online learning in adversarial Lipschitz environments

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
No regret learning in oligopolies: cournot vs. bertrand

SAGT'10 Proceedings of the Third international conference on Algorithmic game theory
Design is as Easy as Optimization

SIAM Journal on Discrete Mathematics
Better Algorithms for Benign Bandits

The Journal of Machine Learning Research
Hierarchical Knowledge Gradient for Sequential Sampling

The Journal of Machine Learning Research
On following the perturbed leader in the bandit setting

ALT'05 Proceedings of the 16th international conference on Algorithmic Learning Theory
Logarithmic regret algorithms for online convex optimization

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Design is as easy as optimization

ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part I
Online Learning and Online Convex Optimization

Foundations and Trends® in Machine Learning
Toward a classification of finite partial-monitoring games

Theoretical Computer Science
Ranked bandits in metric spaces: learning diverse rankings over large document collections

The Journal of Machine Learning Research
Trading regret for efficiency: online convex optimization with long term constraints

The Journal of Machine Learning Research
Online submodular minimization

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study a general online convex optimization problem. We have a convex set S and an unknown sequence of cost functions c1, c2,..., and in each period, we choose a feasible point xt in S, and learn the cost ct(xt). If the function ct is also revealed after each period then, as Zinkevich shows in [25], gradient descent can be used on these functions to get regret bounds of O(√n). That is, after n rounds, the total cost incurred will be O(√n) more than the cost of the best single feasible decision chosen with the benefit of hindsight, minx Σ ct(x).We extend this to the "bandit" setting, where, in each period, only the cost ct(xt) is revealed, and bound the expected regret as O(n3/4).Our approach uses a simple approximation of the gradient that is computed from evaluating ct at a single (random) point. We show that this biased estimate is sufficient to approximate gradient descent on the sequence of functions. In other words, it is possible to use gradient descent without seeing anything more than the value of the functions at a single point. The guarantees hold even in the most general case: online against an adaptive adversary.For the online linear optimization problem [15], algorithms with low regrets in the bandit setting have recently been given against oblivious [1] and adaptive adversaries [19]. In contrast to these algorithms, which distinguish between explicit explore and exploit periods, our algorithm can be interpreted as doing a small amount of exploration in each period.