Nonapproximability results for partially observable Markov decision processes

Authors:
Christopher Lusena;Judy Goldsmith;Martin Mundhenk
Affiliations:
Dept. of Computer Science, University of Kentucky, Lexington, KY;Dept. of Computer Science, University of Kentucky, Lexington, KY;FB IV - Informatik, Universitä Trier, Trier, Germany
Venue:
Journal of Artificial Intelligence Research
Year:
2001

Citing 27
Cited 14

Intractable problems in control theory

SIAM Journal on Control and Optimization
The complexity of Markov decision processes

Mathematics of Operations Research
Computationally feasible bounds for partially observed Markov decision processes

Operations Research
A Survey of solution techniques for the partially observed Markov decision process

Annals of Operations Research
IP = PSPACE

Journal of the ACM (JACM)
Memoryless policies: theoretical limitations and practical results

SAB94 Proceedings of the third international conference on Simulation of adaptive behavior : from animals to animats 3: from animals to animats 3
Limits to parallel computation: P-completeness theory

Limits to parallel computation: P-completeness theory
Feature-based methods for large scale dynamic programming

Machine Learning - Special issue on reinforcement learning
On the complexity of partially observed Markov decision processes

Theoretical Computer Science - Special issue on complexity theory and the theory of algorithms as developed in the CIS
On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Complexity of finite-horizon Markov decision process problems

Journal of the ACM (JACM)
The Complexity of Optimal Small Policies

Mathematics of Operations Research
Learning Policies with External Memory

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
The Complexity of Policy Evaluation for Finite-Horizon Partially-Observable Markov Decision Processes

MFCS '97 Proceedings of the 22nd International Symposium on Mathematical Foundations of Computer Science
Policy Iteration for Factored MDPs

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Dynamic Programming

Dynamic Programming
Efficient dynamic-programming updates in partially observable Markov decision processes

Efficient dynamic-programming updates in partially observable Markov decision processes
The complexity of planning with partially-observable Markov decision processes

The complexity of planning with partially-observable Markov decision processes
Exact and approximate algorithms for partially observable markov decision processes

Exact and approximate algorithms for partially observable markov decision processes
Planning and control in stochastic domains with imperfect information

Planning and control in stochastic domains with imperfect information
Finite-memory control of partially observable systems

Finite-memory control of partially observable systems
My brain is full: when more memory helps

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Solving POMDPs by searching the space of finite policies

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Learning finite-state controllers for partially observable environments

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
A method for speeding up value iteration in partially observable Markov decision processes

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Solving POMDPs by searching in policy space

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
On the complexity of solving Markov decision problems

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence

Stochastic Boolean Satisfiability

Journal of Automated Reasoning
Nearly deterministic abstractions of Markov decision processes

Eighteenth national conference on Artificial intelligence
Greedy linear value-approximation for factored Markov decision processes

Eighteenth national conference on Artificial intelligence
On the undecidability of probabilistic planning and related stochastic optimization problems

Artificial Intelligence - special issue on planning with uncertainty and incomplete information
Bayesian sparse sampling for on-line reward optimization

ICML '05 Proceedings of the 22nd international conference on Machine learning
Point-Based Value Iteration for Continuous POMDPs

The Journal of Machine Learning Research
Partially observable Markov decision processes with imprecise parameters

Artificial Intelligence
Selecting treatment strategies with dynamic limited-memory influence diagrams

Artificial Intelligence in Medicine
Possibilistic Influence Diagrams

Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Reinforcement learning in POMDPs without resets

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Partially Observable Markov Decision Processes: A Geometric Technique and Analysis

Operations Research
Q-learning with linear function approximation

COLT'07 Proceedings of the 20th annual conference on Learning theory
Simple model-based exploration and exploitation of Markov decision processes using the elimination algorithm

MICAI'07 Proceedings of the artificial intelligence 6th Mexican international conference on Advances in artificial intelligence
On the Computational Complexity of Stochastic Controller Optimization in POMDPs

ACM Transactions on Computation Theory (TOCT)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We show that for several variations of partially observable Markov decision processes, polynomial-time algorithms for finding control policies are unlikely to or simply don't have guarantees of finding policies within a constant factor or a constant summand of optimal. Here "unlikely" means "unless some complexity classes collapse," where the collapses considered are P = NP, P = PSPACE, or P = EXP. Until or unless these collapses are shown to hold, any control-policy designer must choose between such performance guarantees and efficient computation.