The weighted majority algorithm
Information and Computation
The Continuum-Armed Bandit Problem
SIAM Journal on Control and Optimization
A decision-theoretic generalization of on-line learning and an application to boosting
Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Introduction to Algorithms
The Nonstochastic Multiarmed Bandit Problem
SIAM Journal on Computing
Adapting to a reliable network path
Proceedings of the twenty-second annual symposium on Principles of distributed computing
Path kernels and multiplicative updates
The Journal of Machine Learning Research
Adaptive routing with end-to-end feedback: distributed learning and geometric approaches
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
An algebraic approach to practical and scalable overlay network monitoring
Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
Online convex optimization in the bandit setting: gradient descent without a gradient
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Robbing the bandit: less regret in online geometric optimization against an adaptive adversary
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Efficient algorithms for online decision problems
Journal of Computer and System Sciences - Special issue: Learning theory 2003
Computing the unmeasured: an algebraic approach to Internet mapping
IEEE Journal on Selected Areas in Communications
Multi-armed bandits in metric spaces
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Regret Minimization and Job Scheduling
SOFSEM '10 Proceedings of the 36th Conference on Current Trends in Theory and Practice of Computer Science
From optimization to regret minimization and back again
SysML'08 Proceedings of the Third conference on Tackling computer systems problems with machine learning techniques
Sharp dichotomies for regret minimization in metric spaces
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Understanding and protecting privacy: formal semantics and principled audit mechanisms
ICISS'11 Proceedings of the 7th international conference on Information Systems Security
Ranked bandits in metric spaces: learning diverse rankings over large document collections
The Journal of Machine Learning Research
Hi-index | 0.00 |
This paper studies an online linear optimization problem generalizing the multi-armed bandit problem. Motivated primarily by the task of designing adaptive routing algorithms for overlay networks, we present two randomized online algorithms for selecting a sequence of routing paths in a network with unknown edge delays varying adversarially over time. In contrast with earlier work on this problem, we assume that the only feedback after choosing such a path is the total end-to-end delay of the selected path. We present two algorithms whose regret is sublinear in the number of trials and polynomial in the size of the network. The first of these algorithms generalizes to solve any online linear optimization problem, given an oracle for optimizing linear functions over the set of strategies; our work may thus be interpreted as a general-purpose reduction from offline to online linear optimization. A key element of this algorithm is the notion of a barycentric spanner, a special type of basis for the vector space of strategies which allows any feasible strategy to be expressed as a linear combination of basis vectors using bounded coefficients. We also present a second algorithm for the online shortest path problem, which solves the problem using a chain of online decision oracles, one at each node of the graph. This has several advantages over the online linear optimization approach. First, it is effective against an adaptive adversary, whereas our linear optimization algorithm assumes an oblivious adversary. Second, even in the case of an oblivious adversary, the second algorithm performs slightly better than the first, as measured by their additive regret.