Exponentiated gradient versus gradient descent for linear predictors
Information and Computation
A one-measurement form of simultaneous perturbation stochastic approximation
Automatica (Journal of IFAC)
Random walks and an O*(n5) volume algorithm for convex bodies
Random Structures & Algorithms
Solving convex programs by random walks
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Neuro-Dynamic Programming
Randomized Algorithms for Stochastic Approximation under Arbitrary Disturbances
Automation and Remote Control
Path Kernels and Multiplicative Updates
COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
Introduction to Stochastic Search and Optimization
Introduction to Stochastic Search and Optimization
Gambling in a rigged casino: The adversarial multi-armed bandit problem
FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
Efficient algorithms for universal portfolios
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Simulated Annealing in Convex Bodies and an 0*(n4) Volume Algorithm
FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Adaptive routing with end-to-end feedback: distributed learning and geometric approaches
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Anytime algorithms for multi-armed bandit problems
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
ACM SIGACT News
Perspectives on multiagent learning
Artificial Intelligence
Generalized multiagent learning with performance bound
Autonomous Agents and Multi-Agent Systems
Online linear optimization and adaptive routing
Journal of Computer and System Sciences
Agnostically learning decision trees
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Multi-armed bandits in metric spaces
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Efficient bandit algorithms for online multiclass prediction
Proceedings of the 25th international conference on Machine learning
Approximation algorithms for restless bandit problems
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Better algorithms for benign bandits
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
A Nonparametric Asymptotic Analysis of Inventory Planning with Censored Demand
Mathematics of Operations Research
Mathematics of Operations Research
A simpler unified analysis of budget perceptrons
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Interactively optimizing information retrieval systems as a dueling bandits problem
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Large-scale uncertainty management systems: learning and exploiting your data
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Multi-armed Bandits with Metric Switching Costs
ICALP '09 Proceedings of the 36th Internatilonal Collogquium on Automata, Languages and Programming: Part II
Efficient no-regret multiagent learning
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 1
Emerging behaviors by learning joint coordination in articulated mobile robots
IWANN'07 Proceedings of the 9th international work conference on Artificial neural networks
Online learning with prior knowledge
COLT'07 Proceedings of the 20th annual conference on Learning theory
Approximation algorithms for restless bandit problems
Journal of the ACM (JACM)
Sharp dichotomies for regret minimization in metric spaces
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Online learning in adversarial Lipschitz environments
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
No regret learning in oligopolies: cournot vs. bertrand
SAGT'10 Proceedings of the Third international conference on Algorithmic game theory
Design is as Easy as Optimization
SIAM Journal on Discrete Mathematics
Better Algorithms for Benign Bandits
The Journal of Machine Learning Research
Hierarchical Knowledge Gradient for Sequential Sampling
The Journal of Machine Learning Research
On following the perturbed leader in the bandit setting
ALT'05 Proceedings of the 16th international conference on Algorithmic Learning Theory
Logarithmic regret algorithms for online convex optimization
COLT'06 Proceedings of the 19th annual conference on Learning Theory
Design is as easy as optimization
ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part I
Online Learning and Online Convex Optimization
Foundations and Trends® in Machine Learning
Toward a classification of finite partial-monitoring games
Theoretical Computer Science
Ranked bandits in metric spaces: learning diverse rankings over large document collections
The Journal of Machine Learning Research
Trading regret for efficiency: online convex optimization with long term constraints
The Journal of Machine Learning Research
Online submodular minimization
The Journal of Machine Learning Research
Hi-index | 0.00 |
We study a general online convex optimization problem. We have a convex set S and an unknown sequence of cost functions c1, c2,..., and in each period, we choose a feasible point xt in S, and learn the cost ct(xt). If the function ct is also revealed after each period then, as Zinkevich shows in [25], gradient descent can be used on these functions to get regret bounds of O(√n). That is, after n rounds, the total cost incurred will be O(√n) more than the cost of the best single feasible decision chosen with the benefit of hindsight, minx Σ ct(x).We extend this to the "bandit" setting, where, in each period, only the cost ct(xt) is revealed, and bound the expected regret as O(n3/4).Our approach uses a simple approximation of the gradient that is computed from evaluating ct at a single (random) point. We show that this biased estimate is sufficient to approximate gradient descent on the sequence of functions. In other words, it is possible to use gradient descent without seeing anything more than the value of the functions at a single point. The guarantees hold even in the most general case: online against an adaptive adversary.For the online linear optimization problem [15], algorithms with low regrets in the bandit setting have recently been given against oblivious [1] and adaptive adversaries [19]. In contrast to these algorithms, which distinguish between explicit explore and exploit periods, our algorithm can be interpreted as doing a small amount of exploration in each period.