Principles of artificial intelligence
Principles of artificial intelligence
LAO: a heuristic search algorithm that finds solutions with loops
Artificial Intelligence - Special issue on heuristic search in artificial intelligence
Dynamic Programming and Optimal Control, Two Volume Set
Dynamic Programming and Optimal Control, Two Volume Set
Neuro-Dynamic Programming
Introduction to Algorithms
The Complexity of Decentralized Control of Markov Decision Processes
Mathematics of Operations Research
Greedy linear value-approximation for factored Markov decision processes
Eighteenth national conference on Artificial intelligence
Piecewise linear value function approximation for factored MDPs
Eighteenth national conference on Artificial intelligence
Symbolic heuristic search for factored Markov decision processes
Eighteenth national conference on Artificial intelligence
Region-based incremental pruning for POMDPs
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Prioritization Methods for Accelerating MDP Solvers
The Journal of Machine Learning Research
Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees
ICML '05 Proceedings of the 22nd international conference on Machine learning
On the Speed of Convergence of Value Iteration on Stochastic Shortest-Path Problems
Mathematics of Operations Research
Focused real-time dynamic programming for MDPs: squeezing more out of a heuristic
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Partitioned external-memory value iteration
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Efficient solution algorithms for factored MDPs
Journal of Artificial Intelligence Research
Planning with durative actions in stochastic domains
Journal of Artificial Intelligence Research
A heuristic search approach to planning with continuous resources in stochastic domains
Journal of Artificial Intelligence Research
Topological value iteration algorithm for Markov decision processes
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Faster heuristic search algorithms for planning with uncertainty and full feedback
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Planning with continuous resources in stochastic domains
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Learning to act using real-time dynamic programming
Artificial Intelligence
Domain-independent, automatic partitioning for probabilistic planning
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Topological order planner for POMDPs
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
ReTrASE: integrating paradigms for approximate probabilistic planning
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Bayesian real-time dynamic programming
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Computing and using lower and upper bounds for action elimination in MDP planning
SARA'07 Proceedings of the 7th International conference on Abstraction, reformulation, and approximation
SPUDD: stochastic planning using decision diagrams
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Planning under continuous time and resource uncertainty: a challenge for AI
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Hierarchical solution of Markov decision processes using macro-actions
UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Flexible decomposition algorithms for weakly coupled Markov decision problems
UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
On the complexity of solving Markov decision problems
UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Symbolic generalization for on-line planning
UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Hi-index | 0.00 |
Value iteration is a powerful yet inefficient algorithm for Markov decision processes (MDPs) because it puts the majority of its effort into backing up the entire state space, which turns out to be unnecessary in many cases. In order to overcome this problem, many approaches have been proposed. Among them, ILAO* and variants of RTDP are state-of-the-art ones. These methods use reachability analysis and heuristic search to avoid some unnecessary backups. However, none of these approaches build the graphical structure of the state transitions in a pre-processing step or use the structural information to systematically decompose a problem, whereby generating an intelligent backup sequence of the state space. In this paper, we present two optimal MDP algorithms. The first algorithm, topological value iteration (TVI), detects the structure of MDPs and backs up states based on topological sequences. It (1) divides an MDP into strongly-connected components (SCCs), and (2) solves these components sequentially. TVI outperforms VI and other state-of-the-art algorithms vastly when an MDP has multiple, close-to-equal-sized SCCs. The second algorithm, focused topological value iteration (FTVI), is an extension of TVI. FTVI restricts its attention to connected components that are relevant for solving the MDP. Specifically, it uses a small amount of heuristic search to eliminate provably sub-optimal actions; this pruning allows FTVI to find smaller connected components, thus running faster. We demonstrate that FTVI outperforms TVI by an order of magnitude, averaged across several domains. Surprisingly, FTVI also significantly outperforms popular 'heuristically-informed' MDP algorithms such as ILAO*, LRTDP, BRTDP and Bayesian-RTDP in many domains, sometimes by as much as two orders of magnitude. Finally, we characterize the type of domains where FTVI excels -- suggesting a way to an informed choice of solver.