Hierarchical task decomposition through symbiosis in reinforcement learning

  • Authors:
  • John A. Doucette;Peter Lichodzijewski;Malcolm I. Heywood

  • Affiliations:
  • University of Waterloo, Waterloo, ON, Canada;Dalhousie University, Halifax, NS, Canada;Dalhousie University, Halifax, NS, Canada

  • Venue:
  • Proceedings of the 14th annual conference on Genetic and evolutionary computation
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Adopting a symbiotic model of evolution separates context for deploying an action from the action itself. Such a separation provides a mechanism for task decomposition in temporal sequence learning. Moreover, previously learned policies are taken to be synonymous with meta actions (actions that are themselves policies). Should solutions to the task not be forthcoming in an initial round of evolution, then solutions from the earlier round represent the 'meta' actions for a new round of evolution. This provides the basis for evolving policy trees. A benchmarking study is performed using the Acrobot handstand task. Solutions to date from reinforcement learning have not been able to approach the performance of those established 14 years ago using an A* search and a priori knowledge regarding the Acrobot energy equations. The proposed symbiotic approach is able to match and, for the first time, better these results. Moreover, unlike previous work, solutions are tested under a broad range of Acrobot initial conditions, with hierarchical solutions providing significantly better generalization performance.