Topological value iteration algorithms

Authors:
Peng Dai;Mausam ;Daniel S. Weld;Judy Goldsmith
Affiliations:
Google Inc., Mountain View, CA;Department of Computer Science and Engineering, University of Washington, Seattle, WA;Department of Computer Science and Engineering, University of Washington, Seattle, WA;Department of Computer Science, University of Kentucky, Lexington, KY
Venue:
Journal of Artificial Intelligence Research
Year:
2011

Citing 35
Cited 0

Principles of artificial intelligence

Principles of artificial intelligence
Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time

Machine Learning
LAO: a heuristic search algorithm that finds solutions with loops

Artificial Intelligence - Special issue on heuristic search in artificial intelligence
Dynamic Programming and Optimal Control, Two Volume Set

Dynamic Programming and Optimal Control, Two Volume Set
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Introduction to Algorithms

Introduction to Algorithms
The Complexity of Decentralized Control of Markov Decision Processes

Mathematics of Operations Research
Greedy linear value-approximation for factored Markov decision processes

Eighteenth national conference on Artificial intelligence
Piecewise linear value function approximation for factored MDPs

Eighteenth national conference on Artificial intelligence
Symbolic heuristic search for factored Markov decision processes

Eighteenth national conference on Artificial intelligence
Region-based incremental pruning for POMDPs

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Prioritization Methods for Accelerating MDP Solvers

The Journal of Machine Learning Research
Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees

ICML '05 Proceedings of the 22nd international conference on Machine learning
On the Speed of Convergence of Value Iteration on Stochastic Shortest-Path Problems

Mathematics of Operations Research
Focused real-time dynamic programming for MDPs: squeezing more out of a heuristic

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Partitioned external-memory value iteration

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
How good is almost perfect?

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Efficient solution algorithms for factored MDPs

Journal of Artificial Intelligence Research
Planning with durative actions in stochastic domains

Journal of Artificial Intelligence Research
A heuristic search approach to planning with continuous resources in stochastic domains

Journal of Artificial Intelligence Research
Topological value iteration algorithm for Markov decision processes

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Faster heuristic search algorithms for planning with uncertainty and full feedback

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Planning with continuous resources in stochastic domains

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Learning to act using real-time dynamic programming

Artificial Intelligence
Domain-independent, automatic partitioning for probabilistic planning

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Topological order planner for POMDPs

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
ReTrASE: integrating paradigms for approximate probabilistic planning

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Bayesian real-time dynamic programming

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Computing and using lower and upper bounds for action elimination in MDP planning

SARA'07 Proceedings of the 7th International conference on Abstraction, reformulation, and approximation
SPUDD: stochastic planning using decision diagrams

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Planning under continuous time and resource uncertainty: a challenge for AI

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Hierarchical solution of Markov decision processes using macro-actions

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Flexible decomposition algorithms for weakly coupled Markov decision problems

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
On the complexity of solving Markov decision problems

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Symbolic generalization for on-line planning

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Value iteration is a powerful yet inefficient algorithm for Markov decision processes (MDPs) because it puts the majority of its effort into backing up the entire state space, which turns out to be unnecessary in many cases. In order to overcome this problem, many approaches have been proposed. Among them, ILAO* and variants of RTDP are state-of-the-art ones. These methods use reachability analysis and heuristic search to avoid some unnecessary backups. However, none of these approaches build the graphical structure of the state transitions in a pre-processing step or use the structural information to systematically decompose a problem, whereby generating an intelligent backup sequence of the state space. In this paper, we present two optimal MDP algorithms. The first algorithm, topological value iteration (TVI), detects the structure of MDPs and backs up states based on topological sequences. It (1) divides an MDP into strongly-connected components (SCCs), and (2) solves these components sequentially. TVI outperforms VI and other state-of-the-art algorithms vastly when an MDP has multiple, close-to-equal-sized SCCs. The second algorithm, focused topological value iteration (FTVI), is an extension of TVI. FTVI restricts its attention to connected components that are relevant for solving the MDP. Specifically, it uses a small amount of heuristic search to eliminate provably sub-optimal actions; this pruning allows FTVI to find smaller connected components, thus running faster. We demonstrate that FTVI outperforms TVI by an order of magnitude, averaged across several domains. Surprisingly, FTVI also significantly outperforms popular 'heuristically-informed' MDP algorithms such as ILAO*, LRTDP, BRTDP and Bayesian-RTDP in many domains, sometimes by as much as two orders of magnitude. Finally, we characterize the type of domains where FTVI excels -- suggesting a way to an informed choice of solver.