Speeding up the convergence of value iteration in partially observable Markov decision processes

Authors:
Nevin L. Zhang;Weihong Zhang
Affiliations:
Department of Computer Science, Hong Kong University of Science & Technology, Kowloon, Hong Kong, China;Department of Computer Science, Hong Kong University of Science & Technology, Kowloon, Hong Kong, China
Venue:
Journal of Artificial Intelligence Research
Year:
2001

Citing 19
Cited 28

The complexity of Markov decision processes

Mathematics of Operations Research
Computationally feasible bounds for partially observed Markov decision processes

Operations Research
Sunoptimal policies, with bounds, for parameter adaptive decision processes

Operations Research
Planning and acting in partially observable stochastic domains

Artificial Intelligence
Dynamic Programming: Models and Applications

Dynamic Programming: Models and Applications
Efficient dynamic-programming updates in partially observable Markov decision processes

Efficient dynamic-programming updates in partially observable Markov decision processes
Algorithms for partially observable markov decision processes

Algorithms for partially observable markov decision processes
Algorithms for sequential decision-making

Algorithms for sequential decision-making
Exact and approximate algorithms for partially observable markov decision processes

Exact and approximate algorithms for partially observable markov decision processes
Planning and control in stochastic domains with imperfect information

Planning and control in stochastic domains with imperfect information
Value-function approximations for partially observable Markov decision processes

Journal of Artificial Intelligence Research
A model approximation scheme for planning in partially observable stochastic domains

Journal of Artificial Intelligence Research
Approximating optimal policies for partially observable stochastic domains

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
A POMDP approximation algorithm that anticipates the need to observe

PRICAI'00 Proceedings of the 6th Pacific Rim international conference on Artificial intelligence
A heuristic variable grid solution method for POMDPs

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Incremental methods for computing bounds in partially observable Markov decision processes

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
A method for speeding up value iteration in partially observable Markov decision processes

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Solving POMDPs by searching in policy space

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence

Space-Progressive Value Iteration: An Anytime Algorithm for a Class of POMDPs

ECSQARU '01 Proceedings of the 6th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty
The size of MDP factored policies

Eighteenth national conference on Artificial intelligence
Equivalence notions and model minimization in Markov decision processes

Artificial Intelligence - special issue on planning with uncertainty and incomplete information
Heuristic search value iteration for POMDPs

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Exploiting belief bounds: practical POMDPs for personal assistant agents

Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Heuristic anytime approaches to stochastic decision processes

Journal of Heuristics
Point-Based Value Iteration for Continuous POMDPs

The Journal of Machine Learning Research
Partially observable Markov decision processes with imprecise parameters

Artificial Intelligence
An Uncertainty-Based Belief Selection Method for POMDP Value Iteration

ECSQARU '09 Proceedings of the 10th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty
Focused real-time dynamic programming for MDPs: squeezing more out of a heuristic

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Compact, convex upper bound iteration for approximate POMDP planning

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Efficient maximization in solving POMDPs

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
On polynomial sized MDP succinct policies

Journal of Artificial Intelligence Research
Finding approximate POMDP solutions through belief compression

Journal of Artificial Intelligence Research
Restricted value iteration: theory and algorithms

Journal of Artificial Intelligence Research
Perseus: randomized point-based value iteration for POMDPs

Journal of Artificial Intelligence Research
Anytime point-based approximations for large POMDPs

Journal of Artificial Intelligence Research
Online planning algorithms for POMDPs

Journal of Artificial Intelligence Research
Policy iteration for decentralized control of Markov decision processes

Journal of Artificial Intelligence Research
Point-based value iteration: an anytime algorithm for POMDPs

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Solving POMDPs with continuous or large discrete observation spaces

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Using control theory for analysis of reinforcement learning and optimal policy properties in grid-world problems

ICIC'09 Proceedings of the Intelligent computing 5th international conference on Emerging intelligent computing technology and applications
Planning in partially-observable switching-mode continuous domains

Annals of Mathematics and Artificial Intelligence
Accelerating point-based POMDP algorithms via greedy strategies

SIMPAR'10 Proceedings of the Second international conference on Simulation, modeling, and programming for autonomous robots
Belief selection in point-based planning algorithms for POMDPs

AI'06 Proceedings of the 19th international conference on Advances in Artificial Intelligence: Canadian Society for Computational Studies of Intelligence
Implementation techniques for solving POMDPs in personal assistant agents

ProMAS'05 Proceedings of the Third international conference on Programming Multi-Agent Systems
A survey of point-based POMDP solvers

Autonomous Agents and Multi-Agent Systems
Planning for multiple measurement channels in a continuous-state POMDP

Annals of Mathematics and Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Partially observable Markov decision processes (POMDPs) have recently become popular among many AI researchers because they serve as a natural model for planning under uncertainty. Value iteration is a well-known algorithm for finding optimal policies for POMDPs. It typically takes a large number of iterations to converge. This paper proposes a method for accelerating the convergence of value iteration. The method has been evaluated on an array of benchmark problems and was found to be very effective: It enabled value iteration to converge after only a few iterations on all the test problems.