Reinforcement Learning, Spike-Time-Dependent Plasticity, and the BCM Rule

Authors:
Dorit Baras;Ron Meir
Affiliations:
doritb@il.ibm.com;Department of Electrical Engineering, Technion, Haifa 32000, Israel rmeir@ee.technion.ac.il
Venue:
Neural Computation
Year:
2007

Citing 8
Cited 8

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Spiking Neuron Models: An Introduction

Spiking Neuron Models: An Introduction
Relating STDP to BCM

Neural Computation
On Actor-Critic Algorithms

SIAM Journal on Control and Optimization
Spike-Timing-Dependent Hebbian Plasticity as Temporal Difference Learning

Neural Computation
Neural Networks: A Comprehensive Foundation (3rd Edition)

Neural Networks: A Comprehensive Foundation (3rd Edition)
Infinite-horizon policy-gradient estimation

Journal of Artificial Intelligence Research

A spiking neural network model of an actor-critic learning agent

Neural Computation
A Model of Neuronal Specialization Using Hebbian Policy-Gradient with "Slow" Noise

ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part I
A Convergent Online Single Time Scale Actor Critic Algorithm

The Journal of Machine Learning Research
Computational modeling of cortical pathways involved in action execution and action observation

Neurocomputing
Compositionality of arm movements can be realized by propagating synchrony

Journal of Computational Neuroscience
Observational learning based on models of overlapping pathways

ICANN'11 Proceedings of the 21st international conference on Artificial neural networks - Volume Part II
A reinforcement learning framework for spiking networks with dynamic synapses

Computational Intelligence and Neuroscience
Does high firing irregularity enhance learning?

Neural Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Learning agents, whether natural or artificial, must update their internal parameters in order to improve their behavior over time. In reinforcement learning, this plasticity is influenced by an environmental signal, termed a reward, that directs the changes in appropriate directions. We apply a recently introduced policy learning algorithm from machine learning to networks of spiking neurons and derive a spike-time-dependent plasticity rule that ensures convergence to a local optimum of the expected average reward. The approach is applicable to a broad class of neuronal models, including the Hodgkin-Huxley model. We demonstrate the effectiveness of the derived rule in several toy problems. Finally, through statistical analysis, we show that the synaptic plasticity rule established is closely related to the widely used BCM rule, for which good biological evidence exists.