A spiking neural network model of an actor-critic learning agent

Authors:
Wiebke Potjans;Abigail Morrison;Markus Diesmann
Affiliations:
-;-;-
Venue:
Neural Computation
Year:
2009

Citing 27
Cited 8

A drive-reinforcement model of single neuron function: An alternative to the Hebbian neuronal model

AIP Conference Proceedings 151 on Neural Networks for Computing
Differential Hebbian learning

AIP Conference Proceedings 151 on Neural Networks for Computing
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Machine Learning
The Convergence of TD(λ) for General λ

Machine Learning
TD-Gammon, a self-teaching backgammon program, achieves master-level play

Neural Computation
TD(λ) Converges with Probability 1

Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Metalearning and neuromodulation

Neural Networks - Computational models of neuromodulation
Dopamine-dependent plasticity of corticostriatal synapses

Neural Networks - Computational models of neuromodulation
Actor-critic models of the basal ganglia: new anatomical and computational perspectives

Neural Networks - Computational models of neuromodulation
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Isotropic sequence order learning

Neural Computation
Evolution of reinforcement learning in uncertain environments: a simple explanation for complex foraging behaviors

Adaptive Behavior
On Actor-Critic Algorithms

SIAM Journal on Control and Optimization
Temporal Sequence Learning, Prediction, and Control: A Review of Different Models and Their Relation to Biological Mechanisms

Neural Computation
Temporal Difference Model Reproduces Anticipatory Neural Activity

Neural Computation
Spike-Timing-Dependent Hebbian Plasticity as Temporal Difference Learning

Neural Computation
Reinforcement Learning in Continuous Time and Space

Neural Computation
Programmable Logic Construction Kits for Hyper-Real-Time Neuronal Modeling

Neural Computation
Reinforcement Learning Through Modulation of Spike-Timing-Dependent Synaptic Plasticity

Neural Computation
Policy Gradient in Continuous Time

The Journal of Machine Learning Research
Reinforcement Learning, Spike-Time-Dependent Plasticity, and the BCM Rule

Neural Computation
Learning with “Relevance”: Using a Third Factor to Stabilize Hebbian Learning

Neural Computation
Spike-Frequency Adapting Neural Ensembles: Beyond Mean Adaptation and Renewal Theories

Neural Computation
Phenomenological models of synaptic plasticity based on spike timing

Biological Cybernetics - Special Issue: Object Localization
Interconnecting VLSI spiking neural networks using isochronous connections

IWANN'07 Proceedings of the 9th international work conference on Artificial neural networks

Multiagent Reinforcement Learning with Spiking and Non-Spiking Agents in the Iterated Prisoner's Dilemma

ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part I
On the asymptotic equivalence between differential Hebbian and temporal difference learning

Neural Computation
Internal-time temporal difference model for neural value-based decision making

Neural Computation
Compositionality of arm movements can be realized by propagating synchrony

Journal of Computational Neuroscience
A spiking neural model for stable reinforcement of synapses based on multiple distal rewards

Neural Computation
Solving the distal reward problem with rare correlations

Neural Computation
Investigating the computational power of spiking neurons with non-standard behaviors

Neural Networks
Reinforcement learning of two-joint virtual arm reaching in a computer model of sensorimotor cortex

Neural Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

The ability to adapt behavior to maximize reward as a result of interactions with the environment is crucial for the survival of any higher organism. In the framework of reinforcement learning, temporal-difference learning algorithms provide an effective strategy for such goal-directed adaptation, but it is unclear to what extent these algorithms are compatible with neural computation. In this article, we present a spiking neural network model that implements actor-critic temporal-difference learning by combining local plasticity rules with a global reward signal. The network is capable of solving a nontrivial gridworld task with sparse rewards. We derive a quantitative mapping of plasticity parameters and synaptic weights to the corresponding variables in the standard algorithmic formulation and demonstrate that the network learns with a similar speed to its discrete time counterpart and attains the same equilibrium performance.