The weighted majority algorithm
Information and Computation
Path Kernels and Multiplicative Updates
COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
Gambling in a rigged casino: The adversarial multi-armed bandit problem
FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
Three dozen papers on online algorithms
ACM SIGACT News
Online convex optimization in the bandit setting: gradient descent without a gradient
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Anytime algorithms for multi-armed bandit problems
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Robbing the bandit: less regret in online geometric optimization against an adaptive adversary
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Efficient algorithms for online decision problems
Journal of Computer and System Sciences - Special issue: Learning theory 2003
Fast convergence to Wardrop equilibria by adaptive sampling methods
Proceedings of the thirty-eighth annual ACM symposium on Theory of computing
Approximation algorithms and online mechanisms for item pricing
EC '06 Proceedings of the 7th ACM conference on Electronic commerce
Proceedings of the twenty-fifth annual ACM symposium on Principles of distributed computing
Playing games with approximation algorithms
Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
Online linear optimization and adaptive routing
Journal of Computer and System Sciences
Sampling algorithms and coresets for ℓp regression
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
REPLEX: dynamic traffic engineering based on wardrop routing policies
CoNEXT '06 Proceedings of the 2006 ACM CoNEXT conference
Regret minimization and the price of total anarchy
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Better algorithms for benign bandits
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
The Price of Malice in Linear Congestion Games
WINE '08 Proceedings of the 4th International Workshop on Internet and Network Economics
Game-theoretic timing analysis
Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
Large-scale uncertainty management systems: learning and exploiting your data
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Management of Variable Data Streams in Networks
Algorithmics of Large and Complex Networks
Adaptive routing with stale information
Theoretical Computer Science
Adaptive ε-greedy exploration in reinforcement learning based on value differences
KI'10 Proceedings of the 33rd annual German conference on Advances in artificial intelligence
Proceedings of the forty-third annual ACM symposium on Theory of computing
Better Algorithms for Benign Bandits
The Journal of Machine Learning Research
Fast Convergence to Wardrop Equilibria by Adaptive Sampling Methods
SIAM Journal on Computing
On following the perturbed leader in the bandit setting
ALT'05 Proceedings of the 16th international conference on Algorithmic Learning Theory
The shortest path problem under partial monitoring
COLT'06 Proceedings of the 19th annual conference on Learning Theory
Multi-armed bandit algorithms and empirical evaluation
ECML'05 Proceedings of the 16th European conference on Machine Learning
COLT'05 Proceedings of the 18th annual conference on Learning Theory
FPL analysis for adaptive bandits
SAGA'05 Proceedings of the Third international conference on StochasticAlgorithms: foundations and applications
Journal of Computer and System Sciences
Quantitative Analysis of Systems Using Game-Theoretic Learning
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on CAPA'09, Special Section on WHS'09, and Special Section VCPSS' 09
Approximating wardrop equilibria with finitely many agents
DISC'07 Proceedings of the 21st international conference on Distributed Computing
IEEE/ACM Transactions on Networking (TON)
Adaptive collective routing using gaussian process dynamic congestion models
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Trading regret for efficiency: online convex optimization with long term constraints
The Journal of Machine Learning Research
Hi-index | 0.00 |
Minimal delay routing is a fundamental task in networks. Since delays depend on the (potentially unpredictable) traffic distribution, online delay optimization can be quite challenging. While uncertainty about the current network delays may make the current routing choices sub-optimal, the algorithm can nevertheless try to learn the traffic patterns and keep adapting its choice of routing paths so as to perform nearly as well as the best static path. This online shortest path problem is a special case of online linear optimization, a problem in which an online algorithm must choose, in each round, a strategy from some compact set S ⊆ Rd so as to try to minimize a linear cost function which is only revealed at the end of the round. Kalai and Vempala[4] gave an algorithm for such problems in the transparent feedback model, where the entire cost function is revealed at the end of the round. Here we present an algorithm for online linear optimization in the more challenging opaque feedback model, in which only the cost of the chosen strategy is revealed at the end of the round. In the special case of shortest paths, opaque feedback corresponds to the notion that in each round the algorithm learns only the end-to-end cost of the chosen path, not the cost of every edge in the network.We also present a second algorithm for online shortest paths, which solves the shortest-path problem using a chain of online decision oracles, one at each node of the graph. This has several advantages over the online linear optimization approach. First, it is effective against an adaptive adversary, whereas our linear optimization algorithm assumes an oblivious adversary. Second, even in the case of an oblivious adversary, the second algorithm performs better than the first, as measured by their additive regret.