Co-evolution of Shaping Rewards and Meta-Parameters in Reinforcement Learning

  • Authors:
  • Stefan Elfwing;Eiji Uchibe;Kenji Doya;Henrik I. Christensen

  • Affiliations:
  • Centre for Autonomous Systems, Numerical Analysis andComputer Science, KTH, Sweden, Neural Computation Unit, Okinawa Institute of Scienceand Technology, Japan;Neural Computation Unit, Okinawa Institute of Scienceand Technology, Japan;Neural Computation Unit, Okinawa Institute of Scienceand Technology, Japan;Centre for Autonomous Systems, Numerical Analysis andComputer Science, KTH, Sweden

  • Venue:
  • Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this article, we explore an evolutionary approach to theoptimization of potential-based shaping rewards and meta-parametersin reinforcement learning. Shaping rewards is a frequently usedapproach to increase the learning performance of reinforcementlearning, with regards to both initial performance and convergencespeed. Shaping rewards provide additional knowledge to the agent inthe form of richer reward signals, which guide learning tohigh-rewarding states. Reinforcement learning depends critically ona few meta-parameters that modulate the learning updates or theexploration of the environment, such as the learning rate α,the discount factor of future rewards γ, and the temperatureτ that controls the trade-off between exploration andexploitation in softmax action selection. We validate the proposedapproach in simulation using the mountain-car task. We alsotransfer shaping rewards and meta-parameters, evolutionarilyobtained in simulation, to hardware, using a robotic foragingtask.