Efficient sample reuse in policy gradients with parameter-based exploration

Authors:
Tingting Zhao;Hirotaka Hachiya;Voot Tangkaratt;Jun Morimoto;Masashi Sugiyama
Affiliations:
-;-;-;-;-
Venue:
Neural Computation
Year:
2013

Citing 16
Cited 0

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Machine Learning
Introduction to Stochastic Dynamic Programming: Probability and Mathematical

Introduction to Stochastic Dynamic Programming: Probability and Mathematical
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Learning from Scarce Experience

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Eligibility Traces for Off-Policy Policy Evaluation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Temporal credit assignment in reinforcement learning

Temporal credit assignment in reinforcement learning
Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning

The Journal of Machine Learning Research
Natural Actor-Critic

Neurocomputing
Real-time reinforcement learning by sequential Actor-Critics and experience replay

Neural Networks
2010 Special Issue: Parameter-exploring policy gradients

Neural Networks
Reinforcement Learning and Dynamic Programming Using Function Approximators

Reinforcement Learning and Dynamic Programming Using Function Approximators
PEGASUS: a policy search method for large MDPs and POMDPs

UAI'00 Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence
Policy improvement for POMDPs using normalized importance sampling

UAI'01 Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence
The optimal reward baseline for gradient-based reinforcement learning

UAI'01 Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence
Reward-weighted regression with sample reuse for direct policy search in reinforcement learning

Neural Computation
Analysis and improvement of policy gradient estimation

Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

The policy gradient approach is a flexible and powerful reinforcement learning method particularly for problems with continuous actions such as robot control. A common challenge is how to reduce the variance of policy gradient estimates for reliable policy updates. In this letter, we combine the following three ideas and give a highly effective policy gradient method: 1 policy gradients with parameter-based exploration, a recently proposed policy search method with low variance of gradient estimates; 2 an importance sampling technique, which allows us to reuse previously gathered data in a consistent way; and 3 an optimal baseline, which minimizes the variance of gradient estimates with their unbiasedness being maintained. For the proposed method, we give a theoretical analysis of the variance of gradient estimates and show its usefulness through extensive experiments.