Efficient Sample Reuse in EM-Based Policy Search

Authors:
Hirotaka Hachiya;Jan Peters;Masashi Sugiyama
Affiliations:
Tokyo Institute of Technology, Tokyo, Japan 152-8552;Max-Planck Institute for Biological Cybernetics, Tübingen, Germany 72076;Tokyo Institute of Technology, Tokyo, Japan 152-8552
Venue:
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Year:
2009

Citing 10
Cited 3

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Machine Learning
Using expectation-maximization for reinforcement learning

Neural Computation
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Learning from Scarce Experience

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Eligibility Traces for Off-Policy Policy Evaluation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Policy Improvement for POMDPs Using Normalized Importance Sampling

UAI '01 Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence
Reinforcement learning by reward-weighted regression for operational space control

Proceedings of the 24th international conference on Machine learning
Covariate Shift Adaptation by Importance Weighted Cross Validation

The Journal of Machine Learning Research
Adaptive importance sampling with automatic model selection in value function approximation

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Natural actor-critic

ECML'05 Proceedings of the 16th European conference on Machine Learning

Density Ratio Estimation: A New Versatile Tool for Machine Learning

ACML '09 Proceedings of the 1st Asian Conference on Machine Learning: Advances in Machine Learning
Direct density-ratio estimation with dimensionality reduction via least-squares hetero-distributional subspace search

Neural Networks
Reward-weighted regression with sample reuse for direct policy search in reinforcement learning

Neural Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Direct policy search is a promising reinforcement learning framework in particular for controlling in continuous, high-dimensional systems such as anthropomorphic robots. Policy search often requires a large number of samples for obtaining a stable policy update estimator due to its high flexibility. However, this is prohibitive when the sampling cost is expensive. In this paper, we extend an EM-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, called Reward-weighted Regression with sample Reuse (R3), is demonstrated through a robot learning experiment.