Adaptive Markov Control Processes
Adaptive Markov Control Processes
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Hi-index | 0.01 |
Markov Decision Processes [MDPs] have been repeteadly used in Economy and Engineering but apparently are still far from achieving its full potential due to the computational difficulties inherent to the subject due to the usual impossibility of finding explicit optimal solutions. Value iteration is an elegant, theoretical method of approximating an optimal solution, frequently mentioned in Economy when MDPs are used. To extend its use and benefits, improved understanding of its convergence is needed still even if it would appear not to be the case. For instance, the corresponding convergence properties of the policies is still not well understood. In this paper we further analyze this issue: using Value Iteration, if a stationary policy fN is obtained in th N-th iteration, such that the optimal discounted rewards of f* and fN are close, we would like to know whether are the corresponding actions f*(x) and fN(x) necessarily close for each state x? To our knowledge this question is still largely open. In this paper it is studied when it is possible to stop the value iteration algorithm so that the corresponding maximizer stationary policy fN approximates an optimal policy both in the total discounted reward and in the action space (uniformly over the state space). In this article the action space is assumed to be a compact set and the reward function bounded. An ergodicity condition on the transition probability law and a structural condition on the reward function are needed. Under these conditions, an upper bound on the number of steps needed in the value iteration algorithm, such that its corresponding maximizer is a uniform approximation of the optimal policy, is obtained.