Using a time-delay actor-critic neural architecture with dopamine-like reinforcement signal for learning in autonomous robots

  • Authors:
  • Andrés Pérez-Uribe

  • Affiliations:
  • Parallelism and Artificial Intelligence Group, Department of Informatics, University of Fribourg, Switzerland

  • Venue:
  • Emergent neural computational architectures based on neuroscience
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Neuroscientists have identified a neural substrate of prediction and reward in experiments with primates. The so-called dopamine neurons have been shown to code an error in the temporal prediction of rewards. Similarly, artificial systems can "learn to predict" by the so-called temporal-difference (TD) methods. Based on the general resemblance between the effective reinforcement term of TD models and the response of dopamine neurons, neuroscientists have developed a TD-learning time-delay actor-critic neural model and compared its performance with the behavior of monkeys in the laboratory. We have used such a neural network model to learn to predict variable-delay rewards in a robot spatial choice task similar to the one used by neuroscientists with primates. Such architecture implementing TD-learning appears as a promising mechanism for robotic systems that learn from simple human teaching signals in the real world.