Fast planning through planning graph analysis
Artificial Intelligence
Neuro-Dynamic Programming
Using Abstract Models of Behaviours to Automatically Generate Reinforcement Learning Hierarchies
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
A Heuristic for Domain Independent Planning and Its Use in an Enforced Hill-Climbing Algorithm
ISMIS '00 Proceedings of the 12th International Symposium on Foundations of Intelligent Systems
Recent Advances in Hierarchical Reinforcement Learning
Discrete Event Dynamic Systems
Hierarchical reinforcement learning with the MAXQ value function decomposition
Journal of Artificial Intelligence Research
IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2
STRIPS: a new approach to the application of theorem proving to problem solving
IJCAI'71 Proceedings of the 2nd international joint conference on Artificial intelligence
Hi-index | 0.00 |
One of the major difficulties in applying Q-learning to realworld domains is the sharp increase in the number of learning steps required to converge towards an optimal policy as the size of the state space is increased. In this paper we propose a method, PLANQ-learning, that couples a Q-learner with a STRIPS planner. The planner shapes the reward function, and thus guides the Q-learner quickly to the optimal policy. We demonstrate empirically that this combination of highlevel reasoning and low-level learning displays significant improvements in scaling-up behaviour as the state-space grows larger, compared to both standard Q-learning and hierarchical Q-learning methods.