Neuro-Dynamic Programming
SIAM Journal on Control and Optimization
Neurocomputing
A New Natural Policy Gradient by Stationary Distribution Metric
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Learning Representation and Control in Markov Decision Processes: New Frontiers
Foundations and Trends® in Machine Learning
Natural actor-critic algorithms
Automatica (Journal of IFAC)
CHOMP: gradient optimization techniques for efficient motion planning
ICRA'09 Proceedings of the 2009 IEEE international conference on Robotics and Automation
Hessian matrix distribution for Bayesian policy gradient reinforcement learning
Information Sciences: an International Journal
ECML'05 Proceedings of the 16th European conference on Machine Learning
Unified inter and intra options learning using policy gradient methods
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
A comparative study of reinforcement learning techniques on dialogue management
EACL '12 Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics
The Journal of Machine Learning Research
Reinforcement learning in robotics: A survey
International Journal of Robotics Research
Reinforcement learning algorithms with function approximation: Recent advances and applications
Information Sciences: an International Journal
Hi-index | 0.00 |
We investigate the problem of non-covariant behavior of policy gradient reinforcement learning algorithms. The policy gradient approach is amenable to analysis by information geometric methods. This leads us to propose a natural metric on controller parameterization that results from considering the manifold of probability distributions over paths induced by a stochastic controller. Investigation of this approach leads to a covariant gradient ascent rule. Interesting properties of this rule are discussed, including its relation with actor-critic style reinforcement learning algorithms. The algorithms discussed here are computationally quite efficient and on some interesting problems lead to dramatic performance improvement over noncovariant rules.