Reward-weighted regression with sample reuse for direct policy search in reinforcement learning

  • Authors:
  • Hirotaka Hachiya;Jan Peters;Masashi Sugiyama

  • Affiliations:
  • -;-;-

  • Venue:
  • Neural Computation
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Direct policy search is a promising reinforcement learning framework, in particular for controlling continuous, high-dimensional systems. Policy search often requires a large number of samples for obtaining a stable policy update estimator, and this is prohibitive when the sampling cost is expensive. In this letter, we extend an expectation-maximization-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, reward-weighted regression with sample reuse (R), is demonstrated through robot learning experiments. (This letter is an extended version of our earlier conference paper: Hachiya, Peters, & Sugiyama, 2009.)