Online learning in Markov decision processes with arbitrarily changing rewards and transitions

  • Authors:
  • Jia Yuan Yu;Shie Mannor

  • Affiliations:
  • Department of Electrical and Computer Engineering, McGill University;Department of Electrical and Computer Engineering, McGill University and Department of Electrical Engineering, Technion

  • Venue:
  • GameNets'09 Proceedings of the First ICST international conference on Game Theory for Networks
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider decision-making problems in Markov decision processes where both the rewards and the transition probabilities vary in an arbitrary (e.g., non-stationary) fashion. We present algorithms that combine online learning and robust control, and establish guarantees on their performance evaluated in retrospect against alternative policies--i.e., their regret. These guarantees depend critically on the range of uncertainty in the transition probabilities, but hold regardless of the changes in rewards and transition probabilities over time. We present a version of the main algorithm in the setting where the decision-maker's observations are limited to its trajectory, and another version that allows a trade-off between performance and computational complexity.