On the Empirical State-Action Frequencies in Markov Decision Processes Under General Policies

  • Authors:
  • Shie Mannor;John N. Tsitsiklis

  • Affiliations:
  • Department of Electrical and Computer Engineering, McGill University, 3480 University Street, Montreal, Québec, Canada H3A 2A7;Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139

  • Venue:
  • Mathematics of Operations Research
  • Year:
  • 2005

Quantified Score

Hi-index 0.06

Visualization

Abstract

We consider the empirical state-action frequencies and the empirical reward in weakly communicating finite-state Markov decision processes under general policies. We define a certain polytope and establish that every element of this polytope is the limit of the empirical frequency vector, under some policy, in a strong sense. Furthermore, we show that the probability of exceeding a given distance between the empirical frequency vector and the polytope decays exponentially with time under every policy. We provide similar results for vector-valued empirical rewards.