Modeling dopamine activity by Reinforcement Learning methods: implications from two recent models

Authors:
Patrick Horgan;Fred Cummins
Affiliations:
UCD School of Computer Science and Informatics, University College Dublin Belfield, Dublin 4, Ireland and Neuroscience and Psychiatry Unit, University of Manchester, Manchester, UK M13 9PT;UCD School of Computer Science and Informatics, University College Dublin Belfield, Dublin 4, Ireland
Venue:
Artificial Intelligence Review
Year:
2006

Citing 10
Cited 0

Artificial intelligence: a modern approach

Artificial intelligence: a modern approach
Machine Learning

Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
TD Models of reward predictive responses in dopamine neurons

Neural Networks - Computational models of neuromodulation
Actor-critic models of the basal ganglia: new anatomical and computational perspectives

Neural Networks - Computational models of neuromodulation
Dopamine: generalization and bonuses

Neural Networks - Computational models of neuromodulation
Reinforcement learning models of the dopamine system and their behavioral implications

Reinforcement learning models of the dopamine system and their behavioral implications
Temporal Sequence Learning, Prediction, and Control: A Review of Different Models and Their Relation to Biological Mechanisms

Neural Computation
Representation and timing in theories of the dopamine system

Neural Computation
Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems

Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We compare and contrast two recent computational models of dopamine activity in the human central nervous system at the level of single cells. Both models implement reinforcement learning using the method of temporal differences (TD). To address drawbacks with earlier models, both models employ internal models. The principal difference between the internal models lies in the degree to which they implement the properties of the environment. One employs a partially observable semi-Markov environment; the other uses a form of transition matrix in an iterative manner to generate the sum of future predictions. We show that the internal models employ fundamentally different assumptions and that the assumptions are problematic in each case. Both models lack specification regarding their biological implementation to different degrees. In addition, the model employing the partially observable semi-Markov environment seems to have redundant features. In contrast, the alternate model appears to lack generalizability.