Value iteration and action Ɛ-approximation of optimal policies in discounted Markov decision processes

  • Authors:
  • Raúl Montes-De-Oca;Enrique Lemus-Rodríguez

  • Affiliations:
  • Departamento de Matemáticas, Universidad Autónoma Metropolitana-Iztapalapa, México D.F., México;Escuela de Actuaría, Universidad Anáhuac México-Norte, México, México

  • Venue:
  • MATH'09 Proceedings of the 14th WSEAS International Conference on Applied mathematics
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

It is well-known that in Markov Decision Processes, with a total discounted reward, for instance, it is not always possible to explicitly find the optimal stationary policy f*. But using the Value Iteration, a stationary policy fN such that the optimal discounted rewards of f* and fN are close, for the N-th iteration of the procedure, a question arises: are the actions f*(x) and fN(x) necessarily close for each state x? To our knowledge this question is still largely open. In this paper it is studied when it is possible to stop the value iteration algorithm so that the corresponding maximizer stationary policy fN approximates an optimal policy both in the total discounted reward and in the action space (uniformly over the state space). This kind of results will shed light on important computability issues of great practical interest. In this article the action space is assumed to be a compact set and the reward function bounded. An ergodicity condition on the transition probability law and a structural condition on the reward function are needed. Under these conditions, an upper bound on the number of steps needed in the value iteration algorithm, such that its corresponding maximizer is a uniform approximation of the optimal policy, is obtained.