Space-Progressive Value Iteration: An Anytime Algorithm for a Class of POMDPs

Authors:
Nevin Lianwen Zhang;Weihong Zhang
Affiliations:
-;-
Venue:
ECSQARU '01 Proceedings of the 6th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty
Year:
2001

Citing 8
Cited 1

The complexity of Markov decision processes

Mathematics of Operations Research
Planning under time constraints in stochastic domains

Artificial Intelligence - Special volume on planning and scheduling
Finite-memory control of partially observable systems

Finite-memory control of partially observable systems
Value-function approximations for partially observable Markov decision processes

Journal of Artificial Intelligence Research
Speeding up the convergence of value iteration in partially observable Markov decision processes

Journal of Artificial Intelligence Research
A model approximation scheme for planning in partially observable stochastic domains

Journal of Artificial Intelligence Research
Planning and acting in partially observable stochastic domains

Artificial Intelligence
Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence

Value Iteration over Belief Subspace

ECSQARU '01 Proceedings of the 6th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty

Quantified Score

Hi-index	0.00

Visualization

Abstract

Finding optimal policies for general partially observable Markov decision processes (POMDPs) is computationally difficult primarily due to the need to perform dynamic-programming (DP) updates over the entire belief space. In this paper, we first study a somewhat restrictive class of special POMDPs called almost discernible POMDPs and propose an anytime algorithm called space-progressive value iteration(SPVI). SPVI does not perform DP updates over the entire belief space. Rather it restricts DP updates to a belief subspace that grows over time. It is argued that given sufficient time SPVI can find near-optimal policies for almostdiscernible POMDPs.We then show how SPVI can be applied to more a general class of POMDPs. Empirical results are presented to show the effectiveness of SPVI.