Probabilistic Policy Reuse for inter-task transfer learning

  • Authors:
  • Fernando Fernández;Javier García;Manuela Veloso

  • Affiliations:
  • Universidad Carlos III de Madrid, 28911 Leganes, Madrid, Spain;Universidad Carlos III de Madrid, 28911 Leganes, Madrid, Spain;Carnegie Mellon University, Pittsburgh, United States

  • Venue:
  • Robotics and Autonomous Systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Policy Reuse is a reinforcement learning technique that efficiently learns a new policy by using past similar learned policies. The Policy Reuse learner improves its exploration by probabilistically including the exploitation of those past policies. Policy Reuse was introduced, and its effectiveness was previously demonstrated, in problems with different reward functions in the same state and action spaces. In this article, we contribute Policy Reuse as transfer learning among different domains. We introduce extended Markov Decision Processes (MDPs) to include domains and tasks, where domains have different state and action spaces, and tasks are problems with different rewards within a domain. We show how Policy Reuse can be applied among domains by defining and using a mapping between their state and action spaces. We use several domains, as versions of a simulated RoboCup Keepaway problem, where we show that Policy Reuse can be used as a mechanism of transfer learning significantly outperforming a basic policy learner.