Reusing Old Policies to Accelerate Learning on New MDPs TITLE2:

  • Authors:
  • D. S. Bernstein

  • Affiliations:
  • -

  • Venue:
  • Reusing Old Policies to Accelerate Learning on New MDPs TITLE2:
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the reuse of policies for previous MDPs in learning on a new MDP, under the assumption that the vector of parameters of each MDP is drawn from a fixed probability distribution. We use the options framework, in which an option consists of a set of initiation states, a policy, and a termination condition. We use an option called a \emph{reuse option}, for which the set of initiation states is the set of all states, the policy is a combination of policies from the old MDPs, and the termination condition is based on the number of time steps since the option was initiated. Given policies for $m$ of the MDPs from the distribution, we construct reuse options from the policies and compare performance on an $m+1$st MDP both with and without various reuse options. We find that reuse options can speed initial learning of the $m+1$st task. We also present a distribution of MDPs for which reuse options can slow initial learning. We discuss reasons for this and suggest other ways to design reuse options.