Co-evolution of Rewards and Meta-parameters in Embodied Evolution

  • Authors:
  • Stefan Elfwing;Eiji Uchibe;Kenji Doya

  • Affiliations:
  • Neural Computation Unit, Okinawa Institute of Science and Technology, Okinawa, Japan 904-2234;Neural Computation Unit, Okinawa Institute of Science and Technology, Okinawa, Japan 904-2234;Neural Computation Unit, Okinawa Institute of Science and Technology, Okinawa, Japan 904-2234

  • Venue:
  • Creating Brain-Like Intelligence
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Embodied evolution is a methodology for evolutionary robotics that mimics the distributed, asynchronous, and autonomous properties of biological evolution. The evaluation, selection, and reproduction are carried out by cooperation and competition of the robots, without any need for human intervention. An embodied evolution framework is therefore well suited to study the adaptive learning mechanisms for artificial agents that share the same fundamental constraints as biological agents: self-preservation and self-reproduction. In this paper we propose a framework for performing embodied evolution with a limited number of robots, by utilizing time-sharing in subpopulations of virtual agents. Within this framework, we explore the combination of within-generation learning of basic survival behaviors by reinforcement learning, and evolutionary adaptations over the generations of the basic behavior selection policy, the reward functions, and meta-parameters for reinforcement learning. We apply a biologically inspired selection scheme, in which there is no explicit communication of the individuals' fitness information. The individuals can only reproduce offspring by mating, a pair-wise exchange of genotypes, and the probability that an individual reproduces offspring in its own subpopulation is dependent on the individual's "health", i.e., energy level, at the mating occasion. We validate the proposed method by comparing the proposed method with evolution using standard centralized selection, in simulation, and by transferring the obtained solutions to hardware using two real robots.