2008 Special Issue: Finding intrinsic rewards by embodied evolution and constrained reinforcement learning

Authors:
Eiji Uchibe;Kenji Doya
Affiliations:
Okinawa Institute of Science and Technology, Okinawa 904-2234, Japan;Okinawa Institute of Science and Technology, Okinawa 904-2234, Japan and Nara Institute of Science and Technology, Nara, Japan and ATR Computational Neuroscience laboratories, Japan
Venue:
Neural Networks
Year:
2008

Citing 6
Cited 2

On Actor-Critic Algorithms

SIAM Journal on Control and Optimization
The Cyber Rodent Project: Exploration of Adaptive Mechanisms for Self-Preservation and Self-Reproduction

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Guiding exploration by pre-existing knowledge without modifying reward

Neural Networks
Embodied evolution and learning: the neglected timing of maturation

ECAL'07 Proceedings of the 9th European conference on Advances in artificial life
Darwinian embodied evolution of the learning ability for survival

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Intrinsic Motivation Systems for Autonomous Mental Development

IEEE Transactions on Evolutionary Computation

An empirical comparison of two common multiobjective reinforcement learning algorithms

AI'12 Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence
Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcements driving both action acquisition and reward maximization: A simulated robotic study

Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Understanding the design principle of reward functions is a substantial challenge both in artificial intelligence and neuroscience. Successful acquisition of a task usually requires not only rewards for goals, but also for intermediate states to promote effective exploration. This paper proposes a method for designing 'intrinsic' rewards of autonomous agents by combining constrained policy gradient reinforcement learning and embodied evolution. To validate the method, we use Cyber Rodent robots, in which collision avoidance, recharging from battery packs, and 'mating' by software reproduction are three major 'extrinsic' rewards. We show in hardware experiments that the robots can find appropriate 'intrinsic' rewards for the vision of battery packs and other robots to promote approach behaviors.