On polynomial sized MDP succinct policies

Authors:
Paolo Liberatore
Affiliations:
Dipartimento di Informatica e Sistemistica, Università di Roma "La Sapienza", Roma, Italy
Venue:
Journal of Artificial Intelligence Research
Year:
2004

Citing 22
Cited 1

The complexity of Markov decision processes

Mathematics of Operations Research
A model for reasoning about persistence and causation

Computational Intelligence
A catalog of complexity classes

Handbook of theoretical computer science (vol. A)
The complexity of finite functions

Handbook of theoretical computer science (vol. A)
Abstraction and approximate decision-theoretic planning

Artificial Intelligence
On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Complexity of finite-horizon Markov decision process problems

Journal of the ACM (JACM)
Proof-complexity results for nonmonotonic reasoning

ACM Transactions on Computational Logic (TOCL)
Monotonic reductions, representative equivalence, and compilation of intractable problems

Journal of the ACM (JACM)
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes

Machine Learning
Computing Factored Value Functions for Policies in Structured MDPs

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Preprocessing of intractable problems

Information and Computation
The size of MDP factored policies

Eighteenth national conference on Artificial intelligence
Some connections between nonuniform and uniform complexity classes

STOC '80 Proceedings of the twelfth annual ACM symposium on Theory of computing
Dynamic Programming

Dynamic Programming
Speeding up the convergence of value iteration in partially observable Markov decision processes

Journal of Artificial Intelligence Research
The computational complexity of probabilistic planning

Journal of Artificial Intelligence Research
Probabilistic propositional planning: representations and complexity

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
On the complexity of solving Markov decision problems

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence

A bounded statistical approach for model checking of unbounded until properties

Proceedings of the IEEE/ACM international conference on Automated software engineering

Quantified Score

Hi-index	0.02

Visualization

Abstract

Policies of Markov Decision Processes (MDPs) determine the next action to execute from the current state and, possibly, the history (the past states). When the number of states is large, succinct representations are often used to compactly represent both the MDPs and the policies in a reduced amount of space. In this paper, some problems related to the size of succinctly represented policies are analyzed. Namely, it is shown that some MDPs have policies that can only be represented in space super-polynomial in the size of the MDP, unless the polynomial hierarchy collapses. This fact motivates the study of the problem of deciding whether a given MDP has a policy of a given size and reward. Since some algorithms for MDPs work by finding a succinct representation of the value function, the problem of deciding the existence of a succinct representation of a value function of a given size and reward is also considered.