Hierarchical reinforcement learning: a hybrid approach

  • Authors:
  • Malcolm Ross Kinsella Ryan;Claude Sammut

  • Affiliations:
  • -;-

  • Venue:
  • Hierarchical reinforcement learning: a hybrid approach
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this thesis we investigate the relationships between the symbolic and sub-symbolic methods used for controlling agents by artificial intelligence, focusing in particular on methods that learn. In light of the strengths and weaknesses of each approach, we propose a hybridisation of symbolic and subsymbolic methods to capitalise on the best features of each. We implement such a hybrid system, called RACHEL, which incorporates techniques from Teleo-Reactive Planning, Hierarchical Reinforcement Learning and Inductive Logic Programming. RACHEL uses a novel behaviour representation, as Reinforcement-Learnt Teleo-operators which define the behaviours in terms of their desired consequences but leave the implementation of the policies to be learnt by reinforcement learning. An RL-TOP is a symbolic description of the purpose of a behaviour, and is used by RACHEL as a planning operator and as the definition of a reward function. Two new hierarchical reinforcement learning algorithms are introduced, Planned Hierarchical Semi-Markov Q-Learning and Teleo-Reactive Q-Learning. The former is an extension of the Hierarchical Semi-Markov Q-Learning algorithm to use computer generated plans in place of task-hierarchies (which are commonly provided by the trainer). The latter is an elaboration of the algorithm to include more intelligent behaviour termination. The knowledge contained in the plan is used to determine when an executing behaviour is no longer appropriate, resulting in more efficient policies. Incomplete descriptions of the effects of behaviours can lead the planner to make false assumptions in building plans. As behaviours are learnt not implemented, not every effect of actions can be known in advance. RACHEL implements a “reflector” which monitors for such unwanted side-effects. Using ILP, it learns to predict when they will occur, and so repair its plans to avoid them. Together, the components of RACHEL form a system which is able to receive will achieve its goals, learn concrete policies and optimal choices of behaviour, discover and predict any unwanted side-effects that result and repair its plans to avoid them. It is a demonstration that different approaches to AI, symbolic and sub-symbolic, can be elegantly combined into a single agent architecture.