Reinforcement learning models of the dopamine system and their behavioral implications

  • Authors:
  • Nathaniel D. Daw;David S. Touretzky

  • Affiliations:
  • -;-

  • Venue:
  • Reinforcement learning models of the dopamine system and their behavioral implications
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

This thesis aims to improve theories of how the brain functions and to provide a framework to guide future neuroscientific experiments by making use of theoretical and algorithmic ideas from computer science. The work centers around the detailed understanding of the dopamine system, an important and phylogenetically venerable brain system that is implicated in such general functions as motivation, decision-making and motor control, and whose dysfunction is associated with disorders such as schizophrenia, addiction, and Parkinson's disease. A series of influential models have proposed that the responses of dopamine neurons recorded from behaving monkeys can be identified with the error signal from temporal difference (TD) learning, a reinforcement learning algorithm for learning to predict rewards in order to guide decision-making. Here I propose extensions to these theories that improve them along a number of dimensions simultaneously. The new models that result eliminate several unrealistic simplifying assumptions from the original accounts; explain many sorts of dopamine responses that had previously seemed anomalous; flesh out nascent suggestions that these neurophysiological mechanisms can also explain animal behavior in conditioning experiments; and extend the theories' reach to incorporate proposals about the computational function of several other brain systems that interact with the dopamine neurons. Chapter 3 relaxes the assumption from previous models that the system tracks only short-term predictions about rewards expected within a single experimental trial. It introduces a new model based on average-reward TD learning that suggests that long-run reward predictions affect the slow-timescale, tonic behavior of dopamine neurons. This account resolves a seemingly paradoxical finding that the dopamine system is excited by aversive events such as electric shock, which had fueled several published attacks on the TD theories. These investigations also provide a basis for proposals about the functional role of interactions between the dopamine and serotonin systems, and about behavioral data on animal decision-making. Chapter 4 further revises the theory to account for animals' uncertainty about the timing of events and about the moment-to-moment state of an experimental task. These issues are handled in the context of a TD algorithm incorporating partial observability and semi-Markov dynamics; a number of other new or extant models are shown to follow from this one in various limits. Chapter 5 departs from the thesis' primary methodology of computational modeling to present a complementary attempt to address the same issues empirically. (Abstract shortened by UMI.)