Representation and timing in theories of the dopamine system

Authors:
Nathaniel D. Daw;Aaron C. Courville;David S. Tourtezky
Affiliations:
UCL, Gatsby Computational Neuroscience Unit, London, U.K.;Carnegie Mellon University, Robotics Institute and Center for the Neural Basis of Cognition, Pittsburgh, PA;Carnegie Mellon University, Computer Science Department and Center for the Neural Basis of Cognition, Pittsburgh, PA
Venue:
Neural Computation
Year:
2006

Citing 16
Cited 9

Continuously variable duration hidden Markov models for automatic speech recognition

Computer Speech and Language
Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

Proceedings of the seventh international conference (1990) on Machine learning
Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time

Machine Learning
Planning and acting in partially observable stochastic domains

Artificial Intelligence
What are the computations of the cerebellum, the basal ganglia and the cerebral cortex?

Neural Networks - Special issue on organisation of computation in brain-like systems
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
On Average Versus Discounted Reward Temporal-Difference Learning

Machine Learning
Actor-critic models of the basal ganglia: new anatomical and computational perspectives

Neural Networks - Computational models of neuromodulation
Opponent interactions between serotonin and dopamine

Neural Networks - Computational models of neuromodulation
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Long-term reward prediction in TD models of the dopamine system

Neural Computation
Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning

Management Science
Temporal credit assignment in reinforcement learning

Temporal credit assignment in reinforcement learning
Kalman filter control embedded into the reinforcement learning framework

Neural Computation
Reinforcement learning models of the dopamine system and their behavioral implications

Reinforcement learning models of the dopamine system and their behavioral implications
A Computational Model of the Functional Role of the Ventral-Striatal D2 Receptor in the Expression of Previously Acquired Behaviors

Neural Computation

Shifting Attention Using a Temporal Difference Prediction Error and High-Dimensional Input

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Multiple model-based reinforcement learning explains dopamine neuronal activity

Neural Networks
Context learning in the rodent hippocampus

Neural Computation
Modeling dopamine activity by Reinforcement Learning methods: implications from two recent models

Artificial Intelligence Review
Stimulus representation and the timing of reward-prediction errors in models of the dopamine system

Neural Computation
Interpreting dopamine activities in stochastic reward tasks

ICONIP'08 Proceedings of the 15th international conference on Advances in neuro-information processing - Volume Part I
Noisy-or nodes for conditioning models

SAB'10 Proceedings of the 11th international conference on Simulation of adaptive behavior: from animals to animats
Neural networks letter: Reinforcement learning for discounted values often loses the goal in the application to animal learning

Neural Networks
Dopamine ramps are a consequence of reward prediction errors

Neural Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Although the responses of dopamine neurons in the primate midbrain are well characterized as carrying a temporal difference (TD) error signal for reward prediction, existing theories do not offer a credible account of how the brain keeps track of past sensory events that may be relevant to predicting future reward. Empirically, these shortcomings of previous theories are particularly evident in their account of experiments in which animals were exposed to variation in the timing of events. The original theories mispredicted the results of such experiments due to their use of a representational device called a tapped delay line.Here we propose that a richer understanding of history representation and a better account of these experiments can be given by considering TD algorithms for a formal setting that incorporates two features not originally considered in theories of the dopaminergic response: partial observability (a distinction between the animal's sensory experience and the true underlying state of the world) and semi-Markov dynamics (an explicit account of variation in the intervals between events). The new theory situates the dopaminergic system in a richer functional and anatomical context, since it assumes (in accord with recent computational theories of cortex) that problems of partial observability and stimulus history are solved in sensory cortex using statistical modeling and inference and that the TD system predicts reward using the results of this inference rather than raw sensory data. It also accounts for a range of experimental data, including the experiments involving programmed temporal variability and other previously unmodeled dopaminergic response phenomena, which we suggest are related to subjective noise in animals' interval timing. Finally, it offers new experimental predictions and a rich theoretical framework for designing future experiments.