Reinforcement learning by reward-weighted regression for operational space control

Authors:
Jan Peters;Stefan Schaal
Affiliations:
Max-Planck Institute for Biological Cybernetics, Tuebingen, Germany;University of Southern California, Los Angeles, CA
Venue:
Proceedings of the 24th international conference on Machine learning
Year:
2007

Citing 6
Cited 8

Using expectation-maximization for reinforcement learning

Neural Computation
Robot Dynamics Algorithm

Robot Dynamics Algorithm
Scalable Techniques from Nonparametric Statistics for Real Time Robot Learning

Applied Intelligence
Introduction to Stochastic Search and Optimization

Introduction to Stochastic Search and Optimization
MOSAIC Model for Sensorimotor Learning and Control

Neural Computation
Reinforcement learning: a survey

Journal of Artificial Intelligence Research

Learning to Control in Operational Space

International Journal of Robotics Research
Episodic Reinforcement Learning by Logistic Reward-Weighted Regression

ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
Fitness Expectation Maximization

Proceedings of the 10th international conference on Parallel Problem Solving from Nature: PPSN X
Efficient Sample Reuse in EM-Based Policy Search

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Adaptive importance sampling for value function approximation in off-policy reinforcement learning

Neural Networks
Reward-weighted regression with sample reuse for direct policy search in reinforcement learning

Neural Computation
Reinforcement learning to adjust robot movements to new situations

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Explorative learning of inverse models: A theoretical perspective

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many robot control problems of practical importance, including operational space control, can be reformulated as immediate reward reinforcement learning problems. However, few of the known optimization or reinforcement learning algorithms can be used in online learning control for robots, as they are either prohibitively slow, do not scale to interesting domains of complex robots, or require trying out policies generated by random search, which are infeasible for a physical system. Using a generalization of the EM-base reinforcement learning framework suggested by Dayan & Hinton, we reduce the problem of learning with immediate rewards to a reward-weighted regression problem with an adaptive, integrated reward transformation for faster convergence. The resulting algorithm is efficient, learns smoothly without dangerous jumps in solution space, and works well in applications of complex high degree-of-freedom robots.