Using a time-delay actor-critic neural architecture with dopamine-like reinforcement signal for learning in autonomous robots

Authors:
Andrés Pérez-Uribe
Affiliations:
Parallelism and Artificial Intelligence Group, Department of Informatics, University of Fribourg, Switzerland
Venue:
Emergent neural computational architectures based on neuroscience
Year:
2001

Citing 3
Cited 2

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Mobile Robot Miniaturisation: A Tool for Investigation in Control Algorithms

The 3rd International Symposium on Experimental Robotics III

Towards novel neuroscience-inspired computing

Emergent neural computational architectures based on neuroscience
Actor-Critic Models of Reinforcement Learning in the Basal Ganglia: From Natural to Artificial Rats

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Neuroscientists have identified a neural substrate of prediction and reward in experiments with primates. The so-called dopamine neurons have been shown to code an error in the temporal prediction of rewards. Similarly, artificial systems can "learn to predict" by the so-called temporal-difference (TD) methods. Based on the general resemblance between the effective reinforcement term of TD models and the response of dopamine neurons, neuroscientists have developed a TD-learning time-delay actor-critic neural model and compared its performance with the behavior of monkeys in the laboratory. We have used such a neural network model to learn to predict variable-delay rewards in a robot spatial choice task similar to the one used by neuroscientists with primates. Such architecture implementing TD-learning appears as a promising mechanism for robotic systems that learn from simple human teaching signals in the real world.