Learning from delayed reward und punishment in a spiking neural network model of basal ganglia with opposing d1/d2 plasticity

  • Authors:
  • Jenia Jitsev;Nobi Abraham;Abigail Morrison;Marc Tittgemeyer

  • Affiliations:
  • Cortical Networks and Cognitive Functions Group, Max-Planck-Institute for Neurological Research, Cologne, Germany;Cortical Networks and Cognitive Functions Group, Max-Planck-Institute for Neurological Research, Cologne, Germany;Functional Neural Circuits Group, Bernstein Center Freiburg, Albert-Ludwig University of Freiburg, Freiburg, Germany;Cortical Networks and Cognitive Functions Group, Max-Planck-Institute for Neurological Research, Cologne, Germany

  • Venue:
  • ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part I
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Extending previous work, we introduce a spiking actor-critic network model of learning from reward and punishment in the basal ganglia. In the model, the striatum is taken to be segregated into populations of medium spiny neurons (MSNs) that carry either D1 or D2 dopamine receptor type. This segregation allows explicit representation of both positive and negative expected outcome within the respective population. In line with recent experiments, we further assume that D1 and D2 MSN populations have opposing dopamine-modulated bidirectional synaptic plasticity. Experiments were conducted in a grid world, where a moving agent had to reach a remote rewarded goal state. The network learned not only to approach the rewarded goal, but also to consequently avoid punishments as opposed to the previous model. The spiking network model explains functional role of D1/D2 MSN segregation within striatum, specifically the reversed direction of dopamine-dependent plasticity found at synapses converging on different MSNs.