Mean Field Approximation of the Policy Iteration Algorithm for Graph-based Markov Decision Processes

Authors:
Nathalie Peyrard;Régis Sabbadin
Affiliations:
INRA, Avignon --France, email: peyrard@avignon.inra.fr;INRA, Toulouse --France, email: sabbadin@toulouse.inra.fr
Venue:
Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Year:
2006

Citing 6
Cited 2

Stochastic dynamic programming with factored representations

Artificial Intelligence
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Planning under uncertainty in complex structured environments

Planning under uncertainty in complex structured environments
On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming

Mathematics of Operations Research
Approximate linear-programming algorithms for graph-based Markov decision processes

Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Efficient solution algorithms for factored MDPs

Journal of Artificial Intelligence Research

Approximate linear-programming algorithms for graph-based Markov decision processes

Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
A framework and a mean-field algorithm for the local control of spatial processes

International Journal of Approximate Reasoning

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this article, we consider a compact representation of multidimensional Markov Decision Processes based on Graphs (GMDP). The states and actions of a GMDP are multidimensional and attached to the vertices of a graph allowing the representation of local dynamics and rewards. This approach is in the line of approaches based on Dynamic Bayesian Networks. For policy optimisation, a direct application of the Policy Iteration algorithm, of exponential complexity in the number of nodes of the graph, is not possible for such high dimensional problems and we propose an approximate version of this algorithm derived from the GMDP representation. We do not try to approximate directly the value function, as usually done, but we rather propose an approximation of the occupation measure of the model, based on the mean field principle. Then, we use it to compute the value function and derive approximate policy evaluation and policy improvement methods. Their combination yields an approximate Policy Iteration algorithm of linear complexity in terms of the number of nodes of the graph. Comparisons with the optimal solution, when available, and with a naive short-term policy demonstrate the quality of the proposed procedure.