Learning to act using real-time dynamic programming
Artificial Intelligence - Special volume on computational research on interaction and agency, part 1
LAO: a heuristic search algorithm that finds solutions with loops
Artificial Intelligence - Special issue on heuristic search in artificial intelligence
Dynamic Programming and Optimal Control, Two Volume Set
Dynamic Programming and Optimal Control, Two Volume Set
Dynamic Programming
Prioritization Methods for Accelerating MDP Solvers
The Journal of Machine Learning Research
On the Speed of Convergence of Value Iteration on Stochastic Shortest-Path Problems
Mathematics of Operations Research
Domain-independent structured duplicate detection
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
SPUDD: stochastic planning using decision diagrams
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Planning under continuous time and resource uncertainty: a challenge for AI
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
On the complexity of solving Markov decision problems
UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Efficient Probabilistic Model Checking on General Purpose Graphics Processors
Proceedings of the 16th International SPIN Workshop on Model Checking Software
Domain-independent, automatic partitioning for probabilistic planning
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Topological value iteration algorithms
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
Dynamic programming methods (including value iteration, LAO*, RTDP, and derivatives) are popular algorithms for solving Markov decision processes (MDPs). Unfortunately, however, these techniques store the MDP model extensionally in a table and thus are limited by the amount of main memory available. Since the required space is exponential in the number of domain features, these dynamic programming methods are ineffective for large problems. To address this problem, Edelcamp et al. devised the external memory value iteration (EMVI) algorithm, which uses a clever sorting scheme to efficiently move parts of the model between disk and main memory. While EMVI can handle larger problems than previously addressed, the need to repeatedly perform external sorts still limits scalability. This paper proposes a new approach. We partition an MDP into smaller pieces (blocks), keeping just the relevant blocks in memory and performing Bellman backups block by block. Experiments show that our algorithm is able to solve large MDPs an order of magnitude faster than EMVI.