Combining reinforcement learning with symbolic planning

  • Authors:
  • Matthew Grounds;Daniel Kudenko

  • Affiliations:
  • Department of Computer Science, University of York, York, UK;Department of Computer Science, University of York, York, UK

  • Venue:
  • ALAMAS'05/ALAMAS'06/ALAMAS'07 Proceedings of the 5th , 6th and 7th European conference on Adaptive and learning agents and multi-agent systems: adaptation and multi-agent learning
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

One of the major difficulties in applying Q-learning to realworld domains is the sharp increase in the number of learning steps required to converge towards an optimal policy as the size of the state space is increased. In this paper we propose a method, PLANQ-learning, that couples a Q-learner with a STRIPS planner. The planner shapes the reward function, and thus guides the Q-learner quickly to the optimal policy. We demonstrate empirically that this combination of highlevel reasoning and low-level learning displays significant improvements in scaling-up behaviour as the state-space grows larger, compared to both standard Q-learning and hierarchical Q-learning methods.