The steady-state control problem for markov decision processes

Authors:
S. Akshay;Nathalie Bertrand;Serge Haddad;Loïc Hélouët
Affiliations:
Inria Rennes, France,IIT Bombay, India;Inria Rennes, France;LSV, ENS Cachan & CNRS & INRIA, France;Inria Rennes, France
Venue:
QEST'13 Proceedings of the 10th international conference on Quantitative Evaluation of Systems
Year:
2013

Citing 6
Cited 0

Some algebraic and geometric computations in PSPACE

STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
The Solution of Problems Relative to Probabilistic Automata in the Frame of the Formal Languages Theory

GI - 4. Jahrestagung
Markov Decision Processes in Artificial Intelligence

Markov Decision Processes in Artificial Intelligence
Reasoning about MDPs as Transformers of Probability Distributions

QEST '10 Proceedings of the 2010 Seventh International Conference on the Quantitative Evaluation of Systems
Model Checking MDPs with a Unique Compact Invariant Set of Distributions

QEST '11 Proceedings of the 2011 Eighth International Conference on Quantitative Evaluation of SysTems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses a control problem for probabilistic models in the setting of Markov decision processes (MDP). We are interested in the steady-state control problem which asks, given an ergodic MDP$\mathcal{M}$ and a distribution δgoal, whether there exists a (history-dependent randomized) policy π ensuring that the steady-state distribution of $\mathcal{M}$ under π is exactly δgoal. We first show that stationary randomized policies suffice to achieve a given steady-state distribution. Then we infer that the steady-state control problem is decidable for MDP, and can be represented as a linear program which is solvable in PTIME. This decidability result extends to labeled MDP (LMDP) where the objective is a steady-state distribution on labels carried by the states, and we provide a PSPACE algorithm. We also show that a related steady-state language inclusion problem is decidable in EXPTIME for LMDP. Finally, we prove that if we consider MDP under partial observation (POMDP), the steady-state control problem becomes undecidable.