COLT '90 Proceedings of the third annual workshop on Computational learning theory
The weighted majority algorithm
Information and Computation
Predicting Nearly As Well As the Best Pruning of a Decision Tree
Machine Learning - Special issue on the eighth annual conference on computational learning theory, (COLT '95)
Journal of the ACM (JACM)
Machine Learning - Special issue on context sensitivity and concept drift
Derandomizing Stochastic Prediction Strategies
Machine Learning - Special issue: computational learning theory, COLT '97
Measurement and performance of a cognitive packet network
Computer Networks: The International Journal of Computer and Telecommunications Networking - Special issue on networking middleware: selected papers from the TERENA networking conference 2001
The Nonstochastic Multiarmed Bandit Problem
SIAM Journal on Computing
Tracking a small set of experts by mixing past posteriors
The Journal of Machine Learning Research
Path kernels and multiplicative updates
The Journal of Machine Learning Research
A "Follow the Perturbed Leader"-type Algorithm for Zero-Delay Quantization of Individual Sequences
DCC '04 Proceedings of the Conference on Data Compression
Adaptive routing with end-to-end feedback: distributed learning and geometric approaches
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Autonomous Smart Routing for Network QoS
ICAC '04 Proceedings of the First International Conference on Autonomic Computing
Prediction, Learning, and Games
Prediction, Learning, and Games
Adaptive Routing Using Expert Advice
The Computer Journal
Tracking the best of many experts
COLT'05 Proceedings of the 18th annual conference on Learning Theory
Efficient adaptive algorithms and minimax bounds for zero-delay lossy source coding
IEEE Transactions on Signal Processing
Minimizing regret with label efficient prediction
IEEE Transactions on Information Theory
Hi-index | 0.00 |
The on-line shortest path problem is considered under partial monitoring scenarios. At each round, a decision maker has to choose a path between two distinguished vertices of a weighted directed acyclic graph whose edge weights can change in an arbitrary (adversarial) way such that the loss of the chosen path (defined as the sum of the weights of its composing edges) be small. In the multi-armed bandit setting, after choosing a path, the decision maker learns only the weights of those edges that belong to the chosen path. For this scenario, an algorithm is given whose average cumulative loss in n rounds exceeds that of the best path, matched off-line to the entire sequence of the edge weights, by a quantity that is proportional to $1/\sqrt{n}$and depends only polynomially on the number of edges of the graph. The algorithm can be implemented with linear complexity in the number of rounds n and in the number of edges. This result improves earlier bandit-algorithms which have performance bounds that either depend exponentially on the number of edges or converge to zero at a slower rate than $O(1/\sqrt{n})$. An extension to the so-called label efficient setting is also given, where the decision maker is informed about the weight of the chosen path only with probability ε