The shortest path problem under partial monitoring

  • Authors:
  • András György;Tamás Linder;György Ottucsák

  • Affiliations:
  • Informatics Laboratory, Computer and Automation Research Institute of the Hungarian Academy of Sciences, Budapest, Hungary;Informatics Laboratory, Computer and Automation Research Institute of the Hungarian Academy of Sciences, Budapest, Hungary;Department of Computer Science and Information Theory, Budapest University of Technology and Economics, Budapest, Hungary

  • Venue:
  • COLT'06 Proceedings of the 19th annual conference on Learning Theory
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The on-line shortest path problem is considered under partial monitoring scenarios. At each round, a decision maker has to choose a path between two distinguished vertices of a weighted directed acyclic graph whose edge weights can change in an arbitrary (adversarial) way such that the loss of the chosen path (defined as the sum of the weights of its composing edges) be small. In the multi-armed bandit setting, after choosing a path, the decision maker learns only the weights of those edges that belong to the chosen path. For this scenario, an algorithm is given whose average cumulative loss in n rounds exceeds that of the best path, matched off-line to the entire sequence of the edge weights, by a quantity that is proportional to $1/\sqrt{n}$and depends only polynomially on the number of edges of the graph. The algorithm can be implemented with linear complexity in the number of rounds n and in the number of edges. This result improves earlier bandit-algorithms which have performance bounds that either depend exponentially on the number of edges or converge to zero at a slower rate than $O(1/\sqrt{n})$. An extension to the so-called label efficient setting is also given, where the decision maker is informed about the weight of the chosen path only with probability ε