Pure stationary optimal strategies in Markov decision processes

  • Authors:
  • Hugo Gimbert

  • Affiliations:
  • LIX, Ecole Polytechnique, France

  • Venue:
  • STACS'07 Proceedings of the 24th annual conference on Theoretical aspects of computer science
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Markov decision processes (MDPs) are controllable discrete event systems with stochastic transitions. Performances of an MDP are evaluated by a payoff function. The controller of the MDP seeks to optimize those performances, using optimal strategies. There exists various ways of measuring performances, i.e. various classes of payoff functions. For example, average performances can be evaluated by a mean-payoff function, peak performances by a limsup payoff function, and the parity payoff function can be used to encode logical specifications. Surprisingly, all the MDPs equipped with mean, limsup or parity payoff functions share a common non-trivial property: they admit pure stationary optimal strategies. In this paper, we introduce the class of prefix-independent and submixing payoff functions, and we prove that any MDP equipped with such a payoff function admits pure stationary optimal strategies. This result unifies and simplifies several existing proofs. Moreover, it is a key tool for generating new examples of MDPs with pure stationary optimal strategies.