Numerical recipes in C (2nd ed.): the art of scientific computing
Numerical recipes in C (2nd ed.): the art of scientific computing
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Spiking Neuron Models: An Introduction
Spiking Neuron Models: An Introduction
Approximate Gradient Methods in Policy-Space Optimization of Markov Reward Processes
Discrete Event Dynamic Systems
Learning to Predict by the Methods of Temporal Differences
Machine Learning
Estimation and Approximation Bounds for Gradient-Based Reinforcement Learning
COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
The evidence for neural information processing with precise spike-times: A survey
Natural Computing: an international journal
Systems and Computers in Japan
A Reinforcement Learning Algorithm for Spiking Neural Networks
SYNASC '05 Proceedings of the Seventh International Symposium on Symbolic and Numeric Algorithms for Scientific Computing
What Can a Neuron Learn with Spike-Timing-Dependent Plasticity?
Neural Computation
Intrinsic Stabilization of Output Rates by Spike-Based Hebbian Learning
Neural Computation
Spike-Timing-Dependent Hebbian Plasticity as Temporal Difference Learning
Neural Computation
Infinite-horizon policy-gradient estimation
Journal of Artificial Intelligence Research
Experiments with infinite-horizon, policy-gradient estimation
Journal of Artificial Intelligence Research
A Reward-Value Based Constructive Method for the Autonomous Creation of Machine Controllers
ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part II
A spiking neural network model of an actor-critic learning agent
Neural Computation
A gradient learning rule for the tempotron
Neural Computation
ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part I
Goal-directed feature learning
IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
A Convergent Online Single Time Scale Actor Critic Algorithm
The Journal of Machine Learning Research
Learning spike-based population codes by reward and population feedback
Neural Computation
A Hebbian-based reinforcement learning framework for spike-timing-dependent synapses
ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part II
Phase precession and recession with STDP and Anti-STDP
ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part I
A reinforcement learning framework for spiking networks with dynamic synapses
Computational Intelligence and Neuroscience
Does high firing irregularity enhance learning?
Neural Computation
Pair-Associate learning with modulated spike-time dependent plasticity
ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part I
Learning anticipation through priming in spatio-temporal neural networks
ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part I
Advances in Artificial Neural Systems - Special issue on Advances in Unsupervised Learning Techniques Applied to Biosciences and Medicine
Solving the distal reward problem with rare correlations
Neural Computation
A new supervised learning algorithm for spiking neurons
Neural Computation
Spike-timing-dependent construction
Neural Computation
Hi-index | 0.00 |
The persistent modification of synaptic efficacy as a function of the relative timing of pre- and postsynaptic spikes is a phenomenon known as spike-timing-dependent plasticity (STDP). Here we show that the modulation of STDP by a global reward signal leads to reinforcement learning. We first derive analytically learning rules involving reward-modulated spike-timing-dependent synaptic and intrinsic plasticity, by applying a reinforcement learning algorithm to the stochastic spike response model of spiking neurons. These rules have several features common to plasticity mechanisms experimentally found in the brain. We then demonstrate in simulations of networks of integrate-and-fire neurons the efficacy of two simple learning rules involving modulated STDP. One rule is a direct extension of the standard STDP model (modulated STDP), and the other one involves an eligibility trace stored at each synapse that keeps a decaying memory of the relationships between the recent pairs of pre- and postsynaptic spike pairs (modulated STDP with eligibility trace). This latter rule permits learning even if the reward signal is delayed. The proposed rules are able to solve the XOR problem with both rate coded and temporally coded input and to learn a target output firing-rate pattern. These learning rules are biologically plausible, may be used for training generic artificial spiking neural networks, regardless of the neural model used, and suggest the experimental investigation in animals of the existence of reward-modulated STDP.