Combining reinforcement learning with symbolic planning

Authors:
Matthew Grounds;Daniel Kudenko
Affiliations:
Department of Computer Science, University of York, York, UK;Department of Computer Science, University of York, York, UK
Venue:
ALAMAS'05/ALAMAS'06/ALAMAS'07 Proceedings of the 5th , 6th and 7th European conference on Adaptive and learning agents and multi-agent systems: adaptation and multi-agent learning
Year:
2005

Citing 8
Cited 0

Fast planning through planning graph analysis

Artificial Intelligence
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Using Abstract Models of Behaviours to Automatically Generate Reinforcement Learning Hierarchies

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
A Heuristic for Domain Independent Planning and Its Use in an Enforced Hill-Climbing Algorithm

ISMIS '00 Proceedings of the 12th International Symposium on Foundations of Intelligent Systems
Recent Advances in Hierarchical Reinforcement Learning

Discrete Event Dynamic Systems
Hierarchical reinforcement learning with the MAXQ value function decomposition

Journal of Artificial Intelligence Research
Prioritized goal decomposition of Markov decision processes: toward a synthesis of classical and decision theoretic planning

IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2
STRIPS: a new approach to the application of theorem proving to problem solving

IJCAI'71 Proceedings of the 2nd international joint conference on Artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the major difficulties in applying Q-learning to realworld domains is the sharp increase in the number of learning steps required to converge towards an optimal policy as the size of the state space is increased. In this paper we propose a method, PLANQ-learning, that couples a Q-learner with a STRIPS planner. The planner shapes the reward function, and thus guides the Q-learner quickly to the optimal policy. We demonstrate empirically that this combination of highlevel reasoning and low-level learning displays significant improvements in scaling-up behaviour as the state-space grows larger, compared to both standard Q-learning and hierarchical Q-learning methods.