Revisiting natural actor-critics with value function approximation

Authors:
Matthieu Geist;Olivier Pietquin
Affiliations:
IMS Research Group, Supélec, Metz, France;IMS Research Group, Supélec, Metz, France
Venue:
MDAI'10 Proceedings of the 7th international conference on Modeling decisions for artificial intelligence
Year:
2010

Citing 6
Cited 1

Neuronlike adaptive elements that can solve difficult learning control problems

Neurocomputing: foundations of research
Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
Natural gradient works efficiently in learning

Neural Computation
Least-squares policy iteration

The Journal of Machine Learning Research
Tracking in Reinforcement Learning

ICONIP '09 Proceedings of the 16th International Conference on Neural Information Processing: Part I
An RLS-based natural actor-critic algorithm for locomotion of a two-linked robot arm

CIS'05 Proceedings of the 2005 international conference on Computational Intelligence and Security - Volume Part I

Kalman temporal differences

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Actor-critics architectures have become popular during the last decade in the field of reinforcement learning because of the introduction of the policy gradient with function approximation theorem. It allows combining rationally actorcritic architectures with value function approximation and therefore addressing large-scale problems. Recent researches led to the replacement of policy gradient by a natural policy gradient, improving the efficiency of the corresponding algorithms. However, a common drawback of these approaches is that they require the manipulation of the so-called advantage function which does not satisfy any Bellman equation. Consequently, derivation of actor-critic algorithms is not straightforward. In this paper, we re-derive theorems in a way that allows reasoning directly with the state-action value function (or Q-function) and thus relying on the Bellman equation again. Consequently, new forms of critics can easily be integrated in the actor-critic framework.