Reinforcement Learning Through Modulation of Spike-Timing-Dependent Synaptic Plasticity

Authors:
Răzvan V. Florian
Affiliations:
Center for Cognitive and Neural Studies (Coneural), 400504 Cluj-Napoca, Romania, and Babeş-Bolyai University, Institute for Interdisciplinary Experimental Research, 400271 Cluj-Napoca, Romani ...
Venue:
Neural Computation
Year:
2007

Citing 17
Cited 25

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Machine Learning
Numerical recipes in C (2nd ed.): the art of scientific computing

Numerical recipes in C (2nd ed.): the art of scientific computing
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Spiking Neuron Models: An Introduction

Spiking Neuron Models: An Introduction
Approximate Gradient Methods in Policy-Space Optimization of Markov Reward Processes

Discrete Event Dynamic Systems
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Estimation and Approximation Bounds for Gradient-Based Reinforcement Learning

COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Spike-timing-dependent plasticity and relevant mutual information maximization

Neural Computation
The evidence for neural information processing with precise spike-times: A survey

Natural Computing: an international journal
A pulse neural network reinforcement learning algorithm for partially observable Markov decision processes

Systems and Computers in Japan
A Reinforcement Learning Algorithm for Spiking Neural Networks

SYNASC '05 Proceedings of the Seventh International Symposium on Symbolic and Numeric Algorithms for Scientific Computing
What Can a Neuron Learn with Spike-Timing-Dependent Plasticity?

Neural Computation
Intrinsic Stabilization of Output Rates by Spike-Based Hebbian Learning

Neural Computation
Spike-Timing-Dependent Hebbian Plasticity as Temporal Difference Learning

Neural Computation
Optimal spike-timing-dependent plasticity for precise action potential firing in supervised learning

Neural Computation
Infinite-horizon policy-gradient estimation

Journal of Artificial Intelligence Research
Experiments with infinite-horizon, policy-gradient estimation

Journal of Artificial Intelligence Research

A Reward-Value Based Constructive Method for the Autonomous Creation of Machine Controllers

ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part II
A spiking neural network model of an actor-critic learning agent

Neural Computation
A gradient learning rule for the tempotron

Neural Computation
Multiagent Reinforcement Learning with Spiking and Non-Spiking Agents in the Iterated Prisoner's Dilemma

ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part I
Goal-directed feature learning

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
On the asymptotic equivalence between differential Hebbian and temporal difference learning

Neural Computation
Supervised learning in spiking neural networks with resume: Sequence learning, classification, and spike shifting

Neural Computation
A Convergent Online Single Time Scale Actor Critic Algorithm

The Journal of Machine Learning Research
Learning spike-based population codes by reward and population feedback

Neural Computation
A Hebbian-based reinforcement learning framework for spike-timing-dependent synapses

ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part II
Phase precession and recession with STDP and Anti-STDP

ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part I
Statistical mechanics of reward-modulated learning in decision-making networks

Neural Computation
A reinforcement learning framework for spiking networks with dynamic synapses

Computational Intelligence and Neuroscience
Does high firing irregularity enhance learning?

Neural Computation
From modulated Hebbian plasticity to simple behavior learning through noise and weight saturation

Neural Networks
Pair-Associate learning with modulated spike-time dependent plasticity

ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part I
Learning anticipation through priming in spatio-temporal neural networks

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part I
A spiking neural model for stable reinforcement of synapses based on multiple distal rewards

Neural Computation
A radial basis function spike model for indirect learning via integrate-and-fire sampling and reconstruction techniques

Advances in Artificial Neural Systems - Special issue on Advances in Unsupervised Learning Techniques Applied to Biosciences and Medicine
Solving the distal reward problem with rare correlations

Neural Computation
A supervised multi-spike learning algorithm based on gradient descent for spiking neural networks

Neural Networks
A new supervised learning algorithm for spiking neurons

Neural Computation
Spike-timing-dependent construction

Neural Computation
Reinforcement learning of two-joint virtual arm reaching in a computer model of sensorimotor cortex

Neural Computation
Categorization and decision-making in a neurobiologically plausible spiking network using a STDP-like learning rule

Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

The persistent modification of synaptic efficacy as a function of the relative timing of pre- and postsynaptic spikes is a phenomenon known as spike-timing-dependent plasticity (STDP). Here we show that the modulation of STDP by a global reward signal leads to reinforcement learning. We first derive analytically learning rules involving reward-modulated spike-timing-dependent synaptic and intrinsic plasticity, by applying a reinforcement learning algorithm to the stochastic spike response model of spiking neurons. These rules have several features common to plasticity mechanisms experimentally found in the brain. We then demonstrate in simulations of networks of integrate-and-fire neurons the efficacy of two simple learning rules involving modulated STDP. One rule is a direct extension of the standard STDP model (modulated STDP), and the other one involves an eligibility trace stored at each synapse that keeps a decaying memory of the relationships between the recent pairs of pre- and postsynaptic spike pairs (modulated STDP with eligibility trace). This latter rule permits learning even if the reward signal is delayed. The proposed rules are able to solve the XOR problem with both rate coded and temporally coded input and to learn a target output firing-rate pattern. These learning rules are biologically plausible, may be used for training generic artificial spiking neural networks, regardless of the neural model used, and suggest the experimental investigation in animals of the existence of reward-modulated STDP.