Optimal policy switching algorithms for reinforcement learning

  • Authors:
  • Gheorghe Comanici;Doina Precup

  • Affiliations:
  • McGill University, Montreal, QC, Canada;McGill University, Montreal, QC, Canada

  • Venue:
  • Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We address the problem of single-agent, autonomous sequential decision making. We assume that some controllers or behavior policies are given as prior knowledge, and the task of the agent is to learn how to switch between these policies. We formulate the problem using the framework of reinforcement learning and options (Sutton, Precup & Singh, 1999; Precup, 2000). We derive gradient-based algorithms for learning the termination conditions of options, with the goal of optimizing the expected long-term return. We incorporate the proposed approach into policy-gradient methods with linear function approximation.