Evolutionary Development of Hierarchical Learning Structures

Authors:
S. Elfwing;E. Uchibe;K. Doya;H. I. Christensen
Affiliations:
Initial Res. Project, Okinawa Inst. of Sci. & Technol.;-;-;-
Venue:
IEEE Transactions on Evolutionary Computation
Year:
2007

Citing 0
Cited 5

Finding Exploratory Rewards by Embodied Evolution and Constrained Reinforcement Learning in the Cyber Rodents

Neural Information Processing
Co-evolution of Shaping Rewards and Meta-Parameters in Reinforcement Learning

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
A linear-complexity reparameterisation strategy for the hierarchical bootstrapping of capabilities within perception-action architectures

Image and Vision Computing
Neuroevolution based on reusable and hierarchical modular representation

ICONIP'08 Proceedings of the 15th international conference on Advances in neuro-information processing - Volume Part I
Darwinian embodied evolution of the learning ability for survival

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hierarchical reinforcement learning (RL) algorithms can learn a policy faster than standard RL algorithms. However, the applicability of hierarchical RL algorithms is limited by the fact that the task decomposition has to be performed in advance by the human designer. We propose a Lamarckian evolutionary approach for automatic development of the learning structure in hierarchical RL. The proposed method combines the MAXQ hierarchical RL method and genetic programming (GP). In the MAXQ framework, a subtask can optimize the policy independently of its parent task's policy, which makes it possible to reuse learned policies of the subtasks. In the proposed method, the MAXQ method learns the policy based on the task hierarchies obtained by GP, while the GP explores the appropriate hierarchies using the result of the MAXQ method. To show the validity of the proposed method, we have performed simulation experiments for a foraging task in three different environmental settings. The results show strong interconnection between the obtained learning structures and the given task environments. The main conclusion of the experiments is that the GP can find a minimal strategy, i.e., a hierarchy that minimizes the number of primitive subtasks that can be executed for each type of situation. The experimental results for the most challenging environment also show that the policies of the subtasks can continue to improve, even after the structure of the hierarchy has been evolutionary stabilized, as an effect of Lamarckian mechanisms