Adaptive timing in neural networks: the conditioned response
Biological Cybernetics
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Opponent interactions between serotonin and dopamine
Neural Networks - Computational models of neuromodulation
Learning to Predict by the Methods of Temporal Differences
Machine Learning
Representation and timing in theories of the dopamine system
Neural Computation
Error-backpropagation in networks of fractionally predictive spiking neurons
ICANN'11 Proceedings of the 21th international conference on Artificial neural networks - Volume Part I
Dopamine ramps are a consequence of reward prediction errors
Neural Computation
Hi-index | 0.00 |
The phasic firing of dopamine neurons has been theorized to encode a reward-prediction error as formalized by the temporal-difference (TD) algorithm in reinforcement learning. Most TD models of dopamine have assumed a stimulus representation, known as the complete serial compound, in which each moment in a trial is distinctly represented. We introduce a more realistic temporal stimulus representation for the TD model. In our model, all external stimuli, including rewards, spawn a series of internal microstimuli, which grow weaker and more diffuse over time. These microstimuli are used by the TD learning algorithm to generate predictions of future reward. This new stimulus representation injects temporal generalization into the TD model and enhances correspondence between model and data in several experiments, including those when rewards are omitted or received early. This improved fit mostly derives from the absence of large negative errors in the new model, suggesting that dopamine alone can encode the full range of TD errors in these situations.