Dynamic programming: deterministic and stochastic models
Dynamic programming: deterministic and stochastic models
A New Value Iteration method for the Average Cost Dynamic Programming Problem
SIAM Journal on Control and Optimization
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Dynamic Programming and Stochastic Control
Dynamic Programming and Stochastic Control
Simulation Modeling and Analysis
Simulation Modeling and Analysis
Simulation-based policy generation using large-scale Markov decision processes
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Hi-index | 0.01 |
The value iteration algorithm is a well-known technique for generating solutions to discounted Markov decision process (MDP) models. Although simple to implement, the approach is nevertheless limited in situations where many Markov decision processes must be solved, such as in real-time state-based control problems or in simulation/optimization problems, because of the potentially large number of iterations required for the value function to converge to an ε-optimal solution. Experimental results suggest, however, that the sequence of solution policies associated with each iteration of the algorithm converges much more rapidly than does the value function. This behavior has significant implications for designing solution approaches for MDPs, yet it has not been explicitly characterized in the literature nor generated significant discussion. This paper seeks to generate such discussion by providing comparative empirical convergence results and exploring several predictors that allow estimation of policy convergence speed based on existing MDP parameters.