Efficient gradient estimation for motor control learning

Authors:
Gregory Lawrence;Noah Cowan;Stuart Russell
Affiliations:
Computer Science Division, U.C. Berkeley;Dept. of Mechanical Engineering, Johns Hopkins University;Computer Science Division, U.C. Berkeley
Venue:
UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Year:
2002

Citing 7
Cited 3

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Machine Learning
Response Surface Methodology: Process and Product in Optimization Using Designed Experiments

Response Surface Methodology: Process and Product in Optimization Using Designed Experiments
PEGASUS: A policy search method for large MDPs and POMDPs

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Policy Improvement for POMDPs Using Normalized Importance Sampling

UAI '01 Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence
The Optimal Reward Baseline for Gradient-Based Reinforcement Learning

UAI '01 Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Infinite-horizon policy-gradient estimation

Journal of Artificial Intelligence Research

2008 Special Issue: Reinforcement learning of motor skills with policy gradients

Neural Networks
Emerging motor behaviors: Learning joint coordination in articulated mobile robots

Neurocomputing
Reinforcement learning to adjust robot movements to new situations

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three

Quantified Score

Hi-index	0.00

Visualization

Abstract

The task of estimating the gradient of a function in the presence of noise is central to several forms of reinforcement learning, including policy search methods. We present two techniques for reducing gradient estimation errors in the presence of observable input noise applied to the control signal. The first method extends the idea of a reinforcement baseline by fitting a local model to the response function whose gradient is being estimated; we show how to find the response surface model that minimizes the variance of the gradient estimate, and how to estimate the model from data. The second method improves this further by discounting components of the gradient vector that have high variance. These methods are applied to the problem of motor control learning, where actuator noise has a significant influence on behavior. In particular, we apply the techniques to learn locally optimal controllers for a dart-throwing task using a simulated three-link arm; we demonstrate that the proposed methods significantly improve the response function gradient estimate and, consequently, the learning curve, over existing methods.