Long-term reward prediction in TD models of the dopamine system

Authors:
Nathaniel D. Daw;David S. Touretzky
Affiliations:
Computer Science Department and Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, PA;Computer Science Department and Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, PA
Venue:
Neural Computation
Year:
2002

Citing 6
Cited 11

Average reward reinforcement learning: foundations, algorithms, and empirical results

Machine Learning - Special issue on reinforcement learning
On Average Versus Discounted Reward Temporal-Difference Learning

Machine Learning
Opponent interactions between serotonin and dopamine

Neural Networks - Computational models of neuromodulation
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Dopamine and Inference About Timing

ICDL '02 Proceedings of the 2nd International Conference on Development and Learning
Brief paper: Average cost temporal-difference learning

Automatica (Journal of IFAC)

Opponent interactions between serotonin and dopamine

Neural Networks - Computational models of neuromodulation
Representation and timing in theories of the dopamine system

Neural Computation
Computational algorithms and neuronal network models underlying decision processes

Neural Networks - 2006 Special issue: Neurobiology of decision making
Neural systems implicated in delayed and probabilistic reinforcement

Neural Networks - 2006 Special issue: Neurobiology of decision making
Multiple model-based reinforcement learning explains dopamine neuronal activity

Neural Networks
The actor-critic learning is behind the matching law: Matching versus optimal behaviors

Neural Computation
Letters: Synaptic plasticity model of a spiking neural network for reinforcement learning

Neurocomputing
Combining modalities with different latencies for optimal motor control

Journal of Cognitive Neuroscience
On the Role of Dopamine in Cognitive Vision

Attention in Cognitive Systems. Theories and Systems from an Interdisciplinary Viewpoint
A neurocomputational model for cocaine addiction

Neural Computation
Hyperbolically discounted temporal difference learning

Neural Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article addresses the relationship between long-term reward predictions and slow-timescale neural activity in temporal difference (TD) models of the dopamine system. Such models attempt to explain how the activity of dopamine (DA) neurons relates to errors in the prediction of future rewards. Previous models have been mostly restricted to short-term predictions of rewards expected during a single, somewhat artificially defined trial. Also, the models focused exclusively on the phasic pause-and-burst activity of primate DA neurons; the neurons' slower, tonic background activity was assumed to be constant. This has led to difficulty in explaining the results of neurochemical experiments that measure indications of DA release on a slow timescale, results that seem at first glance inconsistent with a reward prediction model. In this article, we investigate a TD model of DA activity modified so as to enable it to make longer-term predictions about rewards expected far in the future. We show that these predictions manifest themselves as slow changes in the baseline error signal, which we associate with tonic DA activity. Using this model, we make new predictions about the behavior of the DA system in a number of experimental situations. Some of these predictions suggest new computational explanations for previously puzzling data, such as indications from microdialysis studies of elevated DA activity triggered by aversive events.