A Generalized Path Integral Control Approach to Reinforcement Learning

Authors:
Evangelos Theodorou;Jonas Buchli;Stefan Schaal
Affiliations:
-;-;-
Venue:
The Journal of Machine Learning Research
Year:
2010

Citing 17
Cited 7

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Machine Learning
Using expectation-maximization for reinforcement learning

Neural Computation
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Modelling and Control of Robot Manipulators

Modelling and Control of Robot Manipulators
Stochastic Optimal Control and Estimation Methods Adapted to the Noise Characteristics of the Sensorimotor System

Neural Computation
Natural Gradient Learning for Over-and Under-Complete Bases in ICA

Neural Computation
Constructive Incremental Learning from Only Local Information

Neural Computation
Bayesian actor-critic algorithms

Proceedings of the 24th international conference on Machine learning
Machine learning of motor skills for robotics

Machine learning of motor skills for robotics
Learning to Control in Operational Space

International Journal of Robotics Research
Natural Actor-Critic

Neurocomputing
2008 Special Issue: Reinforcement learning of motor skills with policy gradients

Neural Networks
State-Dependent Exploration for Policy Gradient Methods

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Gaussian process dynamic programming

Neurocomputing
Trajectory prediction: learning to map situations to robot trajectories

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Graphical model inference in optimal control of stochastic multi-agent systems

Journal of Artificial Intelligence Research
Infinite-horizon policy-gradient estimation

Journal of Artificial Intelligence Research

Learning variable impedance control

International Journal of Robotics Research
Cross-entropy motion planning

International Journal of Robotics Research
Dynamical movement primitives: Learning attractor models for motor behaviors

Neural Computation
From dynamic movement primitives to associative skill memories

Robotics and Autonomous Systems
Compliant skills acquisition and multi-optima policy search with EM-based reinforcement learning

Robotics and Autonomous Systems
CHOMP: Covariant Hamiltonian optimization for motion planning

International Journal of Robotics Research
Path integral control by reproducing kernel Hilbert space embedding

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the goal to generate more scalable algorithms with higher efficiency and fewer open parameters, reinforcement learning (RL) has recently moved towards combining classical techniques from optimal control and dynamic programming with modern learning techniques from statistical estimation theory. In this vein, this paper suggests to use the framework of stochastic optimal control with path integrals to derive a novel approach to RL with parameterized policies. While solidly grounded in value function estimation and optimal control based on the stochastic Hamilton-Jacobi-Bellman (HJB) equations, policy improvements can be transformed into an approximation problem of a path integral which has no open algorithmic parameters other than the exploration noise. The resulting algorithm can be conceived of as model-based, semi-model-based, or even model free, depending on how the learning problem is structured. The update equations have no danger of numerical instabilities as neither matrix inversions nor gradient learning rates are required. Our new algorithm demonstrates interesting similarities with previous RL research in the framework of probability matching and provides intuition why the slightly heuristically motivated probability matching approach can actually perform well. Empirical evaluations demonstrate significant performance improvements over gradient-based policy learning and scalability to high-dimensional control problems. Finally, a learning experiment on a simulated 12 degree-of-freedom robot dog illustrates the functionality of our algorithm in a complex robot learning scenario. We believe that Policy Improvement with Path Integrals (PI2) offers currently one of the most efficient, numerically robust, and easy to implement algorithms for RL based on trajectory roll-outs.