Reward-weighted regression with sample reuse for direct policy search in reinforcement learning

Authors:
Hirotaka Hachiya;Jan Peters;Masashi Sugiyama
Affiliations:
-;-;-
Venue:
Neural Computation
Year:
2011

Citing 18
Cited 2

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Machine Learning
Using expectation-maximization for reinforcement learning

Neural Computation
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Learning from Scarce Experience

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Eligibility Traces for Off-Policy Policy Evaluation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Trading variance reduction with unbiasedness: the regularized subspace information criterion for robust model selection in kernel regression

Neural Computation
Reinforcement Learning in Continuous Time and Space

Neural Computation
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Springer Handbook of Robotics

Springer Handbook of Robotics
Active Learning in Approximately Linear Regression Based on Conditional Expectation of Generalization Error

The Journal of Machine Learning Research
Reinforcement learning by reward-weighted regression for operational space control

Proceedings of the 24th international conference on Machine learning
Covariate Shift Adaptation by Importance Weighted Cross Validation

The Journal of Machine Learning Research
Efficient Sample Reuse in EM-Based Policy Search

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Adaptive importance sampling for value function approximation in off-policy reinforcement learning

Neural Networks
Real-time reinforcement learning by sequential Actor-Critics and experience replay

Neural Networks
Efficient exploration through active learning for value function approximation in reinforcement learning

Neural Networks
Policy improvement for POMDPs using normalized importance sampling

UAI'01 Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence
Natural actor-critic

ECML'05 Proceedings of the 16th European conference on Machine Learning

2013 Special Issue: Autonomous reinforcement learning with experience replay

Neural Networks
Efficient sample reuse in policy gradients with parameter-based exploration

Neural Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Direct policy search is a promising reinforcement learning framework, in particular for controlling continuous, high-dimensional systems. Policy search often requires a large number of samples for obtaining a stable policy update estimator, and this is prohibitive when the sampling cost is expensive. In this letter, we extend an expectation-maximization-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, reward-weighted regression with sample reuse (R), is demonstrated through robot learning experiments. (This letter is an extended version of our earlier conference paper: Hachiya, Peters, & Sugiyama, 2009.)