Neuro-Dynamic Programming
The MAXQ Method for Hierarchical Reinforcement Learning
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Automatic discovery and transfer of MAXQ hierarchies
Proceedings of the 25th international conference on Machine learning
Topological value iteration algorithm for Markov decision processes
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Solving POMDPs: RTDP-bel vs. point-based algorithms
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Approximate dynamic programming with affine ADDs
Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
SPUDD: stochastic planning using decision diagrams
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Online planning for large MDPs with MAXQ decomposition
Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Learning high-level planning from text
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Integrated task and motion planning in belief space
International Journal of Robotics Research
Hi-index | 0.00 |
This paper presents an algorithm for finding approximately optimal policies in very large Markov decision processes by constructing a hierarchical model and then solving it approximately. It exploits factored representations to achieve compactness and efficiency and to discover connectivity properties of the domain. We provide a bound on the quality of the solutions and give asymptotic analysis of the runtimes; in addition we demonstrate performance on a collection of very large domains. Results show that the quality of resulting policies is very good and the total running times, for both creating and solving the hierarchy, are significantly less than for an optimal factored MDP solver.