Automatic construction of temporally extended actions for MDPs using bisimulation metrics

  • Authors:
  • Pablo Samuel Castro;Doina Precup

  • Affiliations:
  • School of Computer Science, McGill University, Canada;School of Computer Science, McGill University, Canada

  • Venue:
  • EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Temporally extended actions are usually effective in speeding up reinforcement learning. In this paper we present a mechanism for automatically constructing such actions, expressed as options [24], in a finite Markov Decision Process (MDP). To do this, we compute a bisimulation metric [7] between the states in a small MDP and the states in a large MDP, which we want to solve. The shape of this metric is then used to completely define a set of options for the large MDP. We demonstrate empirically that our approach is able to improve the speed of reinforcement learning, and is generally not sensitive to parameter tuning.