A model approximation scheme for planning in partially observable stochastic domains

Authors:
Nevin L. Zhang;Wenju Liu
Affiliations:
Department of Computer Science, Hong Kong University of Science and Technology, Hong Kong, China;Department of Computer Science, Hong Kong University of Science and Technology, Hong Kong, China
Venue:
Journal of Artificial Intelligence Research
Year:
1997

Citing 19
Cited 13

Dynamic programming: deterministic and stochastic models

Dynamic programming: deterministic and stochastic models
Computationally feasible bounds for partially observed Markov decision processes

Operations Research
A Survey of solution techniques for the partially observed Markov decision process

Annals of Operations Research
A survey of algorithmic methods for partially observed Markov decision processes

Annals of Operations Research
Planning and control

Planning and control
Acting optimally in partially observable stochastic domains

AAAI'94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 2)
Dynamic Programming

Dynamic Programming
The Witness Algorithm: Solving Partially Observable Markov Decision Processes

The Witness Algorithm: Solving Partially Observable Markov Decision Processes
Optimal Policies for Partially Observable Markov Decision Processes

Optimal Policies for Partially Observable Markov Decision Processes
Decomposition Techniques for Planning in Stochastic Domains

Decomposition Techniques for Planning in Stochastic Domains
Efficient dynamic-programming updates in partially observable Markov decision processes

Efficient dynamic-programming updates in partially observable Markov decision processes
Algorithms for partially observable markov decision processes

Algorithms for partially observable markov decision processes
Approximating optimal policies for partially observable stochastic domains

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Exploiting structure in policy construction

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Computing optimal policies for partially observable decision processes using compact representations

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
A heuristic variable grid solution method for POMDPs

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Incremental methods for computing bounds in partially observable Markov decision processes

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence
Model reduction techniques for computing approximately optimal solutions for Markov decision processes

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence

Complexity of finite-horizon Markov decision process problems

Journal of the ACM (JACM)
Value Iteration over Belief Subspace

ECSQARU '01 Proceedings of the 6th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty
Space-Progressive Value Iteration: An Anytime Algorithm for a Class of POMDPs

ECSQARU '01 Proceedings of the 6th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty
Hidden-Mode Markov Decision Processes for Nonstationary Sequential Decision Making

Sequence Learning - Paradigms, Algorithms, and Applications
Value iteration working with belief subset

Eighteenth national conference on Artificial intelligence
Discretized approximations for POMDP with average cost

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Dominance and equivalence for sensor-based agents

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Value-function approximations for partially observable Markov decision processes

Journal of Artificial Intelligence Research
Speeding up the convergence of value iteration in partially observable Markov decision processes

Journal of Artificial Intelligence Research
Restricted value iteration: theory and algorithms

Journal of Artificial Intelligence Research
A method for speeding up value iteration in partially observable Markov decision processes

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Planning with partially observable Markov decision processes: advances in exact solution method

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
APPSSAT: approximate probabilistic planning using stochastic satisfiability

ECSQARU'05 Proceedings of the 8th European conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty

Quantified Score

Hi-index	0.00

Visualization

Abstract

Partially observable Markov decision processes (POMDPs) are a natural model for planning problems where effects of actions are nondeterministic and the state of the world is not completely observable. It is difficult to solve POMDPs exactly. This paper proposes a new approximation scheme. The basic idea is to transform a POMDP into another one where additional information is provided by an oracle. The oracle informs the planning agent that the current state of the world is in a certain region. The transformed POMDP is consequently said to be region observable. It is easier to solve than the original POMDP. We propose to solve the transformed POMDP and use its optimal policy to construct an approximate policy for the original POMDP. By controlling the amount of additional information that the oracle provides, it is possible to find a proper tradeoff between computational time and approximation quality. In terms of algorithmic contributions, we study in details how to exploit region observability in solving the transformed POMDP. To facilitate the study, we also propose a new exact algorithm for general POMDPs. The algorithm is conceptually simple and yet is significantly more efficient than all previous exact algorithms.