Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Authors:
Ronald J. Williams
Affiliations:
College of Computer Science, 161 CN, Northeastern University, 360 Huntington Ave., Boston, MA 02115. rjw@corwin.ccs.northeastern.edu
Venue:
Machine Learning
Year:
1992

Citing 0
Cited 143

Using expectation-maximization for reinforcement learning

Neural Computation
Shifting Inductive Bias with Success-Story Algorithm, AdaptiveLevin Search, and Incremental Self-Improvement

Machine Learning - Special issue on inductive transfer
Efficient exploration for optimizing immediate reward

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Stochastic learning of strategic equilibria for auctions

Neural Computation
A Study of Reinforcement Learning in the Continuous Case by the Means of Viscosity Solutions

Machine Learning
Ant colony optimization and stochastic gradient descent

Artificial Life
Learning to Perceive Objects for Autonomous Navigation

Autonomous Robots
Learning of Sensor-Based Arm Motions while Executing High-Level Descriptions of Tasks

Autonomous Robots
Learning Complex Tasks Using a Stepwise Approach

Journal of Intelligent and Robotic Systems
Approximate Gradient Methods in Policy-Space Optimization of Markov Reward Processes

Discrete Event Dynamic Systems
Using Machine Learning Techniques in Real-World Mobile Robots

IEEE Expert: Intelligent Systems and Their Applications
Two timescale analysis of the Alopex algorithm for optimization

Neural Computation
Selection and Reinforcement Learning for Combinatorial Optimization

PPSN VI Proceedings of the 6th International Conference on Parallel Problem Solving from Nature
Path Planning of a Mobile Robot as a Discrete Optimization Problem an Adjustment of Weight Parameters in the Objective Function by Reinforcement Learning

RoboCup 2000: Robot Soccer World Cup IV
Sequential Decision Making Based on Direct Search

Sequence Learning - Paradigms, Algorithms, and Applications
Bounds on Sample Size for Policy Evaluation in Markov Environments

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Nonlinear credit assignment for musical sequences

Second international workshop on Intelligent systems design and application
Reinforcement learning for POMDPs based on action values and stochastic optimization

Eighteenth national conference on Artificial intelligence
Policy gradient methods in multi-agent systems: pursuit problem

Design and application of hybrid intelligent systems
Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning

The Journal of Machine Learning Research
Reliability of internal prediction/estimation and its application: I. adaptive action selection reflecting reliability of value function

Neural Networks
Behavior transfer for value-function-based reinforcement learning

Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Supervised Learning Through Neuronal Response Modulation

Neural Computation
Learning Curves for Stochastic Gradient Descent in Linear Feedforward Networks

Neural Computation
Attention-Gated Reinforcement Learning of Internal Representations for Classification

Neural Computation
A Reinforcement Learning Approach to Online Clustering

Neural Computation
Universal parameter optimisation in games based on SPSA

Machine Learning
Fuzzy Policy Reinforcement Learning in Cooperative Multi-robot Systems

Journal of Intelligent and Robotic Systems
Gradient descent for symmetric and asymmetric multiagent reinforcement learning

Web Intelligence and Agent Systems
Temporal pattern identification using spike-timing dependent plasticity

Neurocomputing
Reinforcement Learning Through Modulation of Spike-Timing-Dependent Synaptic Plasticity

Neural Computation
Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation

The Journal of Machine Learning Research
Policy Gradient in Continuous Time

The Journal of Machine Learning Research
Conditional random fields for multi-agent reinforcement learning

Proceedings of the 24th international conference on Machine learning
Reinforcement learning for a biped robot based on a CPG-actor-critic method

Neural Networks
Model-Based Reinforcement Learning for Partially Observable Games with Sampling-Based State Estimation

Neural Computation
A study of mechanisms for improving robotic group performance

Artificial Intelligence
A gradient-based reinforcement learning approach to dynamic pricing in partially-observable environments

Future Generation Computer Systems
Non-parametric policy gradients: a unified treatment of propositional and relational domains

Proceedings of the 25th international conference on Machine learning
2008 Special Issue: Two forms of immediate reward reinforcement learning for exploratory data analysis

Neural Networks
Reinforcement Learning in Fine Time Discretization

ICANNGA '07 Proceedings of the 8th international conference on Adaptive and Natural Computing Algorithms, Part I
Reinforcement Learning Reward Functions for Unsupervised Learning

ISNN '07 Proceedings of the 4th international symposium on Neural Networks: Advances in Neural Networks
An Extremely Simple Reinforcement Learning Rule for Neural Networks

ISNN '07 Proceedings of the 4th international symposium on Neural Networks: Advances in Neural Networks
A Reinforcement Learning Technique with an Adaptive Action Generator for a Multi-robot System

SAB '08 Proceedings of the 10th international conference on Simulation of Adaptive Behavior: From Animals to Animats
Self-organized Reinforcement Learning Based on Policy Gradient in Nonstationary Environments

ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
Policy Gradients with Parameter-Based Exploration for Control

ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
A Continuous Internal-State Controller for Partially Observable Markov Decision Processes

ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
A Learning Automata Approach to Multi-agent Policy Gradient Learning

KES '08 Proceedings of the 12th international conference on Knowledge-Based Intelligent Information and Engineering Systems, Part II
A New Natural Policy Gradient by Stationary Distribution Metric

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
State-Dependent Exploration for Policy Gradient Methods

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Learning to Combine Motor Primitives Via Greedy Additive Regression

The Journal of Machine Learning Research
Proposal of Exploitation-Oriented Learning PS-r#

IDEAL '08 Proceedings of the 9th International Conference on Intelligent Data Engineering and Automated Learning
Basis Expansion in Natural Actor Critic Methods

Recent Advances in Reinforcement Learning
Behavior Learning Based on a Policy Gradient Method: Separation of Environmental Dynamics and State Values in Policies

PRICAI '08 Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
A spiking neural network model of an actor-critic learning agent

Neural Computation
The factored policy-gradient planner

Artificial Intelligence
Risk-Sensitive Learning via Minimization of Empirical Conditional Value-at-Risk

IEICE - Transactions on Information and Systems
Stability of learning dynamics in two-agent, imperfect-information games

Proceedings of the tenth ACM SIGEVO workshop on Foundations of genetic algorithms
Reinforcement Learning: A Tutorial Survey and Recent Advances

INFORMS Journal on Computing
2009 Special Issue: Exploiting co-adaptation for the design of symbiotic neuroprosthetic assistants

Neural Networks
2009 Special Issue: Goal-directed control and its antipodes

Neural Networks
2009 Special Issue: Adaptive learning via selectionism and Bayesianism, Part I: Connection between the two

Neural Networks
Learning when to stop thinking and do something!

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Monte-Carlo simulation balancing

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Immediate Reward Reinforcement Learning for Clustering and Topology Preserving Mappings

Similarity-Based Clustering
An Inductive Logic Programming Approach to Statistical Relational Learning

Proceedings of the 2005 conference on An Inductive Logic Programming Approach to Statistical Relational Learning
Direct Policy Search Reinforcement Learning for Robot Control

Proceedings of the 2005 conference on Artificial Intelligence Research and Development
Multi-Agent Least-Squares Policy Iteration

Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Reinforcement learning for a CPG-driven biped robot

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Efficient Sample Reuse in EM-Based Policy Search

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Geometric variance reduction in Markov chains: application to value function and gradient estimation

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Cooperative information sharing to improve distributed learning in multi-agent systems

Journal of Artificial Intelligence Research
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Infinite-horizon policy-gradient estimation

Journal of Artificial Intelligence Research
Experiments with infinite-horizon, policy-gradient estimation

Journal of Artificial Intelligence Research
Simultaneous adversarial multi-robot learning

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Natural actor-critic algorithms

Automatica (Journal of IFAC)
Exploiting multiple secondary reinforcers in policy gradient reinforcement learning

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Reinforcement learning-based dynamic bandwidth provisioning for quality of service in differentiated services networks

Computer Communications
Reinforcement learning versus model predictive control: a comparison on a power system problem

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
A novel framework for dynamic spectrum management in multicell OFDMA networks based on reinforcement learning

WCNC'09 Proceedings of the 2009 IEEE conference on Wireless Communications & Networking Conference
A Model of Neuronal Specialization Using Hebbian Policy-Gradient with "Slow" Noise

ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part I
A Swarm-Based Learning Method Inspired by Social Insects

ICIC '07 Proceedings of the 3rd International Conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence
A gradient-based reinforcement learning approach to dynamic pricing in partially-observable environments

A gradient-based reinforcement learning approach to dynamic pricing in partially-observable environments
Learning motor primitives for robotics

ICRA'09 Proceedings of the 2009 IEEE international conference on Robotics and Automation
Derivatives of logarithmic stationary distributions for policy gradient reinforcement learning

Neural Computation
R-POPTVR: a novel reinforcement-based POPTVR fuzzy neural network for pattern classification

IEEE Transactions on Neural Networks
2010 Special Issue: Parameter-exploring policy gradients

Neural Networks
Transfer Learning for Reinforcement Learning Domains: A Survey

The Journal of Machine Learning Research
Reinforcement learning estimation of distribution algorithm

GECCO'03 Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartII
Emerging behaviors by learning joint coordination in articulated mobile robots

IWANN'07 Proceedings of the 9th international work conference on Artificial neural networks
Improving search efficiency in the action space of an instance-based reinforcement learning technique for multi-robot systems

ECAL'07 Proceedings of the 9th European conference on Advances in artificial life
Reinforcement learning for cooperative actions in a partially observable multi-agent system

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Stochastic weights reinforcement learning for exploratory data analysis

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Solving deep memory POMDPs with recurrent policy gradients

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Clustering with reinforcement learning

IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Learning spike-based population codes by reward and population feedback

Neural Computation
A self-organized spectrum assignment strategy in next generation OFDMA networks providing secondary spectrum access

ICC'09 Proceedings of the 2009 IEEE international conference on Communications
Impedance learning for robotic contact tasks using natural actor-critic algorithm

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Feature selection for reinforcement learning: evaluating implicit state-reward dependency via conditional mutual information

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Generalized learning automata for multi-agent reinforcement learning

AI Communications - European Workshop on Multi-Agent Systems (EUMAS) 2009
Node perturbation learning without noiseless baseline

Neural Networks
A Generalized Path Integral Control Approach to Reinforcement Learning

The Journal of Machine Learning Research
Natural actor and belief critic: Reinforcement algorithm for learning parameters of dialogue systems modelled as POMDPs

ACM Transactions on Speech and Language Processing (TSLP)
Reinforcement learning for joint radio resource management in LTE-UMTS scenarios

Computer Networks: The International Journal of Computer and Telecommunications Networking
Learning cost-efficient control policies with XCSF: generalization capabilities and further improvement

Proceedings of the 13th annual conference on Genetic and evolutionary computation
Improving Gaussian process value function approximation in policy gradient algorithms

ICANN'11 Proceedings of the 21st international conference on Artificial neural networks - Volume Part II
Preference-based policy iteration: leveraging preference learning for reinforcement learning

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Lagrange dual decomposition for finite horizon Markov decision processes

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Policy gradient reinforcement learning with environmental dynamics and action-values in policies

KES'11 Proceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems - Volume Part I
A constructive algorithm to synthesize arbitrarily connected feedforward neural networks

Neurocomputing
The thing that we tried didn't work very well: deictic representation in reinforcement learning

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Reinforcement learning with partially known world dynamics

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
PEGASUS: a policy search method for large MDPs and POMDPs

UAI'00 Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence
Learning to cooperate via policy search

UAI'00 Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence
Policy improvement for POMDPs using normalized importance sampling

UAI'01 Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence
The optimal reward baseline for gradient-based reinforcement learning

UAI'01 Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence
Reward-weighted regression with sample reuse for direct policy search in reinforcement learning

Neural Computation
Efficient gradient estimation for motor control learning

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Reinforcement learning for parameter estimation in statistical spoken dialogue systems

Computer Speech and Language
Analysis and improvement of policy gradient estimation

Neural Networks
Learning to approach a moving ball with a simulated two-wheeled robot

RoboCup 2005
Feature extraction for decision-theoretic planning in partially observable environments

ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part I
Self-organizing relays in LTE networks: queuing analysis and algorithms

Proceedings of the 7th International Conference on Network and Services Management
Dynamic cooperator selection in cognitive radio networks

Ad Hoc Networks
A computational model of use-dependent motor recovery following a stroke: Optimizing corticospinal activations via reinforcement learning can explain residual capacity and other strength recovery dynamics

Neural Networks
Learning to make predictions in partially observable environments without a generative model

Journal of Artificial Intelligence Research
Bayesian policy search with policy priors

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Reinforcement learning to adjust robot movements to new situations

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Integrating particle swarm optimization with reinforcement learning in noisy problems

Proceedings of the 14th annual conference on Genetic and evolutionary computation
Neural networks letter: Reinforcement learning for discounted values often loses the goal in the application to animal learning

Neural Networks
Learning high-level planning from text

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Adaptive exploration using stochastic neurons

ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part II
Biologically plausible multi-dimensional reinforcement learning in neural networks

ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part I
Gradient algorithms for exploration/exploitation trade-offs: global and local variants

ANNPR'12 Proceedings of the 5th INNS IAPR TC 3 GIRPR conference on Artificial Neural Networks in Pattern Recognition
Observer effect from stateful resources in agent sensing

Autonomous Agents and Multi-Agent Systems
Compliant skills acquisition and multi-optima policy search with EM-based reinforcement learning

Robotics and Autonomous Systems
2013 Special Issue: Autonomous reinforcement learning with experience replay

Neural Networks
Efficient sample reuse in policy gradients with parameter-based exploration

Neural Computation
Variable risk control via stochastic optimization

International Journal of Robotics Research
Reinforcement learning in robotics: A survey

International Journal of Robotics Research
Counterfactual reasoning and learning systems: the example of computational advertising

The Journal of Machine Learning Research
MineralMiner: An active sensing simulation environment

Multiagent and Grid Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms.