Adaptive routing with end-to-end feedback: distributed learning and geometric approaches

Authors:
Baruch Awerbuch;Robert D. Kleinberg
Affiliations:
Johns Hopkins University, Baltimore, MD;MIT, Cambridge, MA
Venue:
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Year:
2004

Citing 3
Cited 34

The weighted majority algorithm

Information and Computation
Path Kernels and Multiplicative Updates

COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
Gambling in a rigged casino: The adversarial multi-armed bandit problem

FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science

Three dozen papers on online algorithms

ACM SIGACT News
Online convex optimization in the bandit setting: gradient descent without a gradient

SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Anytime algorithms for multi-armed bandit problems

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Robbing the bandit: less regret in online geometric optimization against an adaptive adversary

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Efficient algorithms for online decision problems

Journal of Computer and System Sciences - Special issue: Learning theory 2003
Fast convergence to Wardrop equilibria by adaptive sampling methods

Proceedings of the thirty-eighth annual ACM symposium on Theory of computing
Approximation algorithms and online mechanisms for item pricing

EC '06 Proceedings of the 7th ACM conference on Electronic commerce
Routing without regret: on convergence to nash equilibria of regret-minimizing algorithms in routing games

Proceedings of the twenty-fifth annual ACM symposium on Principles of distributed computing
Playing games with approximation algorithms

Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
Online linear optimization and adaptive routing

Journal of Computer and System Sciences
Sampling algorithms and coresets for ℓp regression

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
REPLEX: dynamic traffic engineering based on wardrop routing policies

CoNEXT '06 Proceedings of the 2006 ACM CoNEXT conference
Regret minimization and the price of total anarchy

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Better algorithms for benign bandits

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
The Price of Malice in Linear Congestion Games

WINE '08 Proceedings of the 4th International Workshop on Internet and Network Economics
Game-theoretic timing analysis

Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
Large-scale uncertainty management systems: learning and exploiting your data

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Management of Variable Data Streams in Networks

Algorithmics of Large and Complex Networks
Adaptive routing with stale information

Theoretical Computer Science
Adaptive ε-greedy exploration in reinforcement learning based on value differences

KI'10 Proceedings of the 33rd annual German conference on Advances in artificial intelligence
Dueling algorithms

Proceedings of the forty-third annual ACM symposium on Theory of computing
Better Algorithms for Benign Bandits

The Journal of Machine Learning Research
Fast Convergence to Wardrop Equilibria by Adaptive Sampling Methods

SIAM Journal on Computing
On following the perturbed leader in the bandit setting

ALT'05 Proceedings of the 16th international conference on Algorithmic Learning Theory
The shortest path problem under partial monitoring

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Multi-armed bandit algorithms and empirical evaluation

ECML'05 Proceedings of the 16th European conference on Machine Learning
Rank, trace-norm and max-norm

COLT'05 Proceedings of the 18th annual conference on Learning Theory
FPL analysis for adaptive bandits

SAGA'05 Proceedings of the Third international conference on StochasticAlgorithms: foundations and applications
Combinatorial bandits

Journal of Computer and System Sciences
Quantitative Analysis of Systems Using Game-Theoretic Learning

ACM Transactions on Embedded Computing Systems (TECS) - Special Section on CAPA'09, Special Section on WHS'09, and Special Section VCPSS' 09
Approximating wardrop equilibria with finitely many agents

DISC'07 Proceedings of the 21st international conference on Distributed Computing
Combinatorial network optimization with unknown variables: multi-armed bandits with linear rewards and individual observations

IEEE/ACM Transactions on Networking (TON)
Adaptive collective routing using gaussian process dynamic congestion models

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Trading regret for efficiency: online convex optimization with long term constraints

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Minimal delay routing is a fundamental task in networks. Since delays depend on the (potentially unpredictable) traffic distribution, online delay optimization can be quite challenging. While uncertainty about the current network delays may make the current routing choices sub-optimal, the algorithm can nevertheless try to learn the traffic patterns and keep adapting its choice of routing paths so as to perform nearly as well as the best static path. This online shortest path problem is a special case of online linear optimization, a problem in which an online algorithm must choose, in each round, a strategy from some compact set S ⊆ Rd so as to try to minimize a linear cost function which is only revealed at the end of the round. Kalai and Vempala[4] gave an algorithm for such problems in the transparent feedback model, where the entire cost function is revealed at the end of the round. Here we present an algorithm for online linear optimization in the more challenging opaque feedback model, in which only the cost of the chosen strategy is revealed at the end of the round. In the special case of shortest paths, opaque feedback corresponds to the notion that in each round the algorithm learns only the end-to-end cost of the chosen path, not the cost of every edge in the network.We also present a second algorithm for online shortest paths, which solves the shortest-path problem using a chain of online decision oracles, one at each node of the graph. This has several advantages over the online linear optimization approach. First, it is effective against an adaptive adversary, whereas our linear optimization algorithm assumes an oblivious adversary. Second, even in the case of an oblivious adversary, the second algorithm performs better than the first, as measured by their additive regret.