Using expectation-maximization for reinforcement learning
Neural Computation
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Learning from Scarce Experience
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Eligibility Traces for Off-Policy Policy Evaluation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Policy Improvement for POMDPs Using Normalized Importance Sampling
UAI '01 Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence
Reinforcement learning by reward-weighted regression for operational space control
Proceedings of the 24th international conference on Machine learning
Covariate Shift Adaptation by Importance Weighted Cross Validation
The Journal of Machine Learning Research
Adaptive importance sampling with automatic model selection in value function approximation
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
ECML'05 Proceedings of the 16th European conference on Machine Learning
Density Ratio Estimation: A New Versatile Tool for Machine Learning
ACML '09 Proceedings of the 1st Asian Conference on Machine Learning: Advances in Machine Learning
Hi-index | 0.00 |
Direct policy search is a promising reinforcement learning framework in particular for controlling in continuous, high-dimensional systems such as anthropomorphic robots. Policy search often requires a large number of samples for obtaining a stable policy update estimator due to its high flexibility. However, this is prohibitive when the sampling cost is expensive. In this paper, we extend an EM-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, called Reward-weighted Regression with sample Reuse (R3), is demonstrated through a robot learning experiment.