Stochastic dynamic programming with factored representations
Artificial Intelligence
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Planning under uncertainty in complex structured environments
Planning under uncertainty in complex structured environments
On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming
Mathematics of Operations Research
Approximate linear-programming algorithms for graph-based Markov decision processes
Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Efficient solution algorithms for factored MDPs
Journal of Artificial Intelligence Research
Approximate linear-programming algorithms for graph-based Markov decision processes
Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
A framework and a mean-field algorithm for the local control of spatial processes
International Journal of Approximate Reasoning
Hi-index | 0.00 |
In this article, we consider a compact representation of multidimensional Markov Decision Processes based on Graphs (GMDP). The states and actions of a GMDP are multidimensional and attached to the vertices of a graph allowing the representation of local dynamics and rewards. This approach is in the line of approaches based on Dynamic Bayesian Networks. For policy optimisation, a direct application of the Policy Iteration algorithm, of exponential complexity in the number of nodes of the graph, is not possible for such high dimensional problems and we propose an approximate version of this algorithm derived from the GMDP representation. We do not try to approximate directly the value function, as usually done, but we rather propose an approximation of the occupation measure of the model, based on the mean field principle. Then, we use it to compute the value function and derive approximate policy evaluation and policy improvement methods. Their combination yields an approximate Policy Iteration algorithm of linear complexity in terms of the number of nodes of the graph. Comparisons with the optimal solution, when available, and with a naive short-term policy demonstrate the quality of the proposed procedure.