Continuously variable duration hidden Markov models for automatic speech recognition
Computer Speech and Language
Proceedings of the seventh international conference (1990) on Machine learning
Planning and acting in partially observable stochastic domains
Artificial Intelligence
What are the computations of the cerebellum, the basal ganglia and the cerebral cortex?
Neural Networks - Special issue on organisation of computation in brain-like systems
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
On Average Versus Discounted Reward Temporal-Difference Learning
Machine Learning
Actor-critic models of the basal ganglia: new anatomical and computational perspectives
Neural Networks - Computational models of neuromodulation
Opponent interactions between serotonin and dopamine
Neural Networks - Computational models of neuromodulation
Learning to Predict by the Methods of Temporal Differences
Machine Learning
Long-term reward prediction in TD models of the dopamine system
Neural Computation
Temporal credit assignment in reinforcement learning
Temporal credit assignment in reinforcement learning
Kalman filter control embedded into the reinforcement learning framework
Neural Computation
Reinforcement learning models of the dopamine system and their behavioral implications
Reinforcement learning models of the dopamine system and their behavioral implications
Shifting Attention Using a Temporal Difference Prediction Error and High-Dimensional Input
Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Context learning in the rodent hippocampus
Neural Computation
Modeling dopamine activity by Reinforcement Learning methods: implications from two recent models
Artificial Intelligence Review
Interpreting dopamine activities in stochastic reward tasks
ICONIP'08 Proceedings of the 15th international conference on Advances in neuro-information processing - Volume Part I
Noisy-or nodes for conditioning models
SAB'10 Proceedings of the 11th international conference on Simulation of adaptive behavior: from animals to animats
Dopamine ramps are a consequence of reward prediction errors
Neural Computation
Hi-index | 0.00 |
Although the responses of dopamine neurons in the primate midbrain are well characterized as carrying a temporal difference (TD) error signal for reward prediction, existing theories do not offer a credible account of how the brain keeps track of past sensory events that may be relevant to predicting future reward. Empirically, these shortcomings of previous theories are particularly evident in their account of experiments in which animals were exposed to variation in the timing of events. The original theories mispredicted the results of such experiments due to their use of a representational device called a tapped delay line.Here we propose that a richer understanding of history representation and a better account of these experiments can be given by considering TD algorithms for a formal setting that incorporates two features not originally considered in theories of the dopaminergic response: partial observability (a distinction between the animal's sensory experience and the true underlying state of the world) and semi-Markov dynamics (an explicit account of variation in the intervals between events). The new theory situates the dopaminergic system in a richer functional and anatomical context, since it assumes (in accord with recent computational theories of cortex) that problems of partial observability and stimulus history are solved in sensory cortex using statistical modeling and inference and that the TD system predicts reward using the results of this inference rather than raw sensory data. It also accounts for a range of experimental data, including the experiments involving programmed temporal variability and other previously unmodeled dopaminergic response phenomena, which we suggest are related to subjective noise in animals' interval timing. Finally, it offers new experimental predictions and a rich theoretical framework for designing future experiments.