Learning to generalize and reuse skills using approximate partial policy homomorphisms

  • Authors:
  • Srividhya Rajendran;Manfred Huber

  • Affiliations:
  • Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington, Texas;Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington, Texas

  • Venue:
  • SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

A reinforcement learning (RL) agent that performs successfully in a complex and dynamic environment has to continuously learn and adapt to perform new tasks. This necessitates for them to not only extract control and representation knowledge from the tasks learned, but also to reuse the extracted knowledge to learn new tasks. This paper presents a new method to extract this control and representational knowledge. Here we present a policy generalization approach that uses the novel concept of policy homomorphism to derive these abstractions. The paper further extends the policy homomorphism framework to an approximate policy. The extension allows policy generalization framework to efficiently address more realistic tasks and environments in nondeterministic domains. The approximate policy homomorphism derives an abstract policy for a set of similar tasks (a task type) from a set of basic policies learned for previously seen task instances. The resulting generalized policy is then applied in new contexts to address new instances of related tasks. The approach also allows to identify similar tasks based on the functional characteristics of the corresponding skills and provides a means of transferring the learned knowledge to new situations without the need for complete knowledge of the state space and the system dynamics in the new environment. We demonstrate the working of policy abstraction using approximate policy homomorphism and illustrate policy reuse to learn new tasks in novel situations using a set of grid world examples.