Proceedings of the seventh international conference (1990) on Machine learning
Technical Note: \cal Q-Learning
Machine Learning
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Near-Optimal Reinforcement Learning in Polynomial Time
Machine Learning
Discovering Hierarchy in Reinforcement Learning with HEXQ
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Algorithm-Directed Exploration for Model-Based Reinforcement Learning in Factored MDPs
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Efficient Reinforcement Learning in Factored MDPs
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Policy Iteration for Factored MDPs
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
R-max - a general polynomial time algorithm for near-optimal reinforcement learning
The Journal of Machine Learning Research
Hierarchical reinforcement learning with the MAXQ value function decomposition
Journal of Artificial Intelligence Research
Hierarchical model-based reinforcement learning: R-max + MAXQ
Proceedings of the 25th international conference on Machine learning
Automatic discovery and transfer of MAXQ hierarchies
Proceedings of the 25th international conference on Machine learning
Hi-index | 0.00 |
Factored representations, model-based learning, and hierarchies are well-studied techniques for improving the learning efficiency of reinforcement-learning algorithms in large-scale state spaces. We bring these three ideas together in a new algorithm. Our algorithm tackles two open problems from the reinforcement-learning literature, and provides a solution to those problems in deterministic domains. First, it shows how models can improve learning speed in the hierarchy-based MaxQ framework without disrupting opportunities for state abstraction. Second, we show how hierarchies can augment existing factored exploration algorithms to achieve not only low sample complexity for learning, but provably efficient planning as well. We illustrate the resulting performance gains in example domains. We prove polynomial bounds on the computational effort needed to attain near optimal performance within the hierarchy.