Using expectation-maximization for reinforcement learning
Neural Computation
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Learning from Scarce Experience
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Eligibility Traces for Off-Policy Policy Evaluation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Reinforcement Learning in Continuous Time and Space
Neural Computation
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
Springer Handbook of Robotics
The Journal of Machine Learning Research
Reinforcement learning by reward-weighted regression for operational space control
Proceedings of the 24th international conference on Machine learning
Covariate Shift Adaptation by Importance Weighted Cross Validation
The Journal of Machine Learning Research
Efficient Sample Reuse in EM-Based Policy Search
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Policy improvement for POMDPs using normalized importance sampling
UAI'01 Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence
ECML'05 Proceedings of the 16th European conference on Machine Learning
Hi-index | 0.00 |
Direct policy search is a promising reinforcement learning framework, in particular for controlling continuous, high-dimensional systems. Policy search often requires a large number of samples for obtaining a stable policy update estimator, and this is prohibitive when the sampling cost is expensive. In this letter, we extend an expectation-maximization-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, reward-weighted regression with sample reuse (R), is demonstrated through robot learning experiments. (This letter is an extended version of our earlier conference paper: Hachiya, Peters, & Sugiyama, 2009.)