Response Surface Methodology: Process and Product in Optimization Using Designed Experiments
Response Surface Methodology: Process and Product in Optimization Using Designed Experiments
PEGASUS: A policy search method for large MDPs and POMDPs
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Policy Improvement for POMDPs Using Normalized Importance Sampling
UAI '01 Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence
The Optimal Reward Baseline for Gradient-Based Reinforcement Learning
UAI '01 Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
Infinite-horizon policy-gradient estimation
Journal of Artificial Intelligence Research
Reinforcement learning to adjust robot movements to new situations
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Hi-index | 0.00 |
The task of estimating the gradient of a function in the presence of noise is central to several forms of reinforcement learning, including policy search methods. We present two techniques for reducing gradient estimation errors in the presence of observable input noise applied to the control signal. The first method extends the idea of a reinforcement baseline by fitting a local model to the response function whose gradient is being estimated; we show how to find the response surface model that minimizes the variance of the gradient estimate, and how to estimate the model from data. The second method improves this further by discounting components of the gradient vector that have high variance. These methods are applied to the problem of motor control learning, where actuator noise has a significant influence on behavior. In particular, we apply the techniques to learn locally optimal controllers for a dart-throwing task using a simulated three-link arm; we demonstrate that the proposed methods significantly improve the response function gradient estimate and, consequently, the learning curve, over existing methods.