Multiple model-based reinforcement learning explains dopamine neuronal activity

Authors:
Mathieu Bertin;Nicolas Schweighofer;Kenji Doya
Affiliations:
ATR Computational Neuroscience Labs, 2-2-2 Hikaridai, "Keihanna Science City", Kyoto 619-0288, Japan and Laboratoire d'Informatique de Paris 6, Universite Paris 6 Pierre et Marie Curie, 4 place Ju ...;Department of Biokinesiology and Physical Therapy, University of Southern California, 1540 E. Alcazar St. CHP 155, Los Angeles 90089-9006, USA;ATR Computational Neuroscience Labs, 2-2-2 Hikaridai, "Keihanna Science City", Kyoto 619-0288, Japan and Neural Computation Unit, Initial Research Project Laboratory, Okinawa Institute of Science ...
Venue:
Neural Networks
Year:
2007

Citing 7
Cited 1

Neural dynamics of adaptive timing temporal discrimination during associative learning

Neural Networks
Multiple paired forward and inverse models for motor control

Neural Networks - Special issue on neural control and robotics: biology and technology
Multiple model-based reinforcement learning

Neural Computation
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Long-term reward prediction in TD models of the dopamine system

Neural Computation
Reinforcement learning models of the dopamine system and their behavioral implications

Reinforcement learning models of the dopamine system and their behavioral implications
Representation and timing in theories of the dopamine system

Neural Computation

Noisy-or nodes for conditioning models

SAB'10 Proceedings of the 11th international conference on Simulation of adaptive behavior: from animals to animats

Quantified Score

Hi-index	0.00

Visualization

Abstract

A number of computational models have explained the behavior of dopamine neurons in terms of temporal difference learning. However, earlier models cannot account for recent results of conditioning experiments; specifically, the behavior of dopamine neurons in case of variation of the interval between a cue stimulus and a reward has not been satisfyingly accounted for. We address this problem by using a modular architecture, in which each module consists of a reward predictor and a value estimator. A ''responsibility signal'', computed from the accuracy of the predictions of the reward predictors, is used to weight the contributions and learning of the value estimators. This multiple-model architecture gives an accurate account of the behavior of dopamine neurons in two specific experiments: when the reward is delivered earlier than expected, and when the stimulus-reward interval varies uniformly over a fixed range.