Efficient Sample Reuse in EM-Based Policy Search

  • Authors:
  • Hirotaka Hachiya;Jan Peters;Masashi Sugiyama

  • Affiliations:
  • Tokyo Institute of Technology, Tokyo, Japan 152-8552;Max-Planck Institute for Biological Cybernetics, Tübingen, Germany 72076;Tokyo Institute of Technology, Tokyo, Japan 152-8552

  • Venue:
  • ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Direct policy search is a promising reinforcement learning framework in particular for controlling in continuous, high-dimensional systems such as anthropomorphic robots. Policy search often requires a large number of samples for obtaining a stable policy update estimator due to its high flexibility. However, this is prohibitive when the sampling cost is expensive. In this paper, we extend an EM-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, called Reward-weighted Regression with sample Reuse (R3), is demonstrated through a robot learning experiment.