Covariant policy search

Authors:
J. Andrew Bagnell;Jeff Schneider
Affiliations:
Robotics Institute, Carnegie-Mellon University, Pittsburgh, PA;Robotics Institute, Carnegie-Mellon University, Pittsburgh, PA
Venue:
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Year:
2003

Citing 2
Cited 13

Neuro-Dynamic Programming

Neuro-Dynamic Programming
On Actor-Critic Algorithms

SIAM Journal on Control and Optimization

Natural Actor-Critic

Neurocomputing
2008 Special Issue: Reinforcement learning of motor skills with policy gradients

Neural Networks
A New Natural Policy Gradient by Stationary Distribution Metric

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Learning Representation and Control in Markov Decision Processes: New Frontiers

Foundations and Trends® in Machine Learning
Natural actor-critic algorithms

Automatica (Journal of IFAC)
CHOMP: gradient optimization techniques for efficient motion planning

ICRA'09 Proceedings of the 2009 IEEE international conference on Robotics and Automation
Hessian matrix distribution for Bayesian policy gradient reinforcement learning

Information Sciences: an International Journal
Natural actor-critic

ECML'05 Proceedings of the 16th European conference on Machine Learning
Unified inter and intra options learning using policy gradient methods

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
A comparative study of reinforcement learning techniques on dialogue management

EACL '12 Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Dynamic policy programming

The Journal of Machine Learning Research
Reinforcement learning in robotics: A survey

International Journal of Robotics Research
Reinforcement learning algorithms with function approximation: Recent advances and applications

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

We investigate the problem of non-covariant behavior of policy gradient reinforcement learning algorithms. The policy gradient approach is amenable to analysis by information geometric methods. This leads us to propose a natural metric on controller parameterization that results from considering the manifold of probability distributions over paths induced by a stochastic controller. Investigation of this approach leads to a covariant gradient ascent rule. Interesting properties of this rule are discussed, including its relation with actor-critic style reinforcement learning algorithms. The algorithms discussed here are computationally quite efficient and on some interesting problems lead to dramatic performance improvement over noncovariant rules.