The complexity of Markov decision processes
Mathematics of Operations Research
Planning under time constraints in stochastic domains
Artificial Intelligence - Special volume on planning and scheduling
Finite-memory control of partially observable systems
Finite-memory control of partially observable systems
Value-function approximations for partially observable Markov decision processes
Journal of Artificial Intelligence Research
Speeding up the convergence of value iteration in partially observable Markov decision processes
Journal of Artificial Intelligence Research
A model approximation scheme for planning in partially observable stochastic domains
Journal of Artificial Intelligence Research
Planning and acting in partially observable stochastic domains
Artificial Intelligence
Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes
UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence
Value Iteration over Belief Subspace
ECSQARU '01 Proceedings of the 6th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty
Hi-index | 0.00 |
Finding optimal policies for general partially observable Markov decision processes (POMDPs) is computationally difficult primarily due to the need to perform dynamic-programming (DP) updates over the entire belief space. In this paper, we first study a somewhat restrictive class of special POMDPs called almost discernible POMDPs and propose an anytime algorithm called space-progressive value iteration(SPVI). SPVI does not perform DP updates over the entire belief space. Rather it restricts DP updates to a belief subspace that grows over time. It is argued that given sufficient time SPVI can find near-optimal policies for almostdiscernible POMDPs.We then show how SPVI can be applied to more a general class of POMDPs. Empirical results are presented to show the effectiveness of SPVI.