Fitted Natural Actor-Critic: A New Algorithm for Continuous State-Action MDPs

Authors:
Francisco S. Melo;Manuel Lopes
Affiliations:
Carnegie Mellon University, Pittsburgh, USA;Institute for Systems and Robotics, , Lisboa, Portugal
Venue:
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Year:
2008

Citing 13
Cited 3

TD-Gammon, a self-teaching backgammon program, achieves master-level play

Neural Computation
Reinforcement learning with replacing eligibility traces

Machine Learning - Special issue on reinforcement learning
Natural gradient works efficiently in learning

Neural Computation
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Kernel-Based Reinforcement Learning

Machine Learning
Open Theoretical Questions in Reinforcement Learning

EuroCOLT '99 Proceedings of the 4th European Conference on Computational Learning Theory
On Actor-Critic Algorithms

SIAM Journal on Control and Optimization
Tree-Based Batch Mode Reinforcement Learning

The Journal of Machine Learning Research
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

Machine Learning
Finite-Time Bounds for Fitted Value Iteration

The Journal of Machine Learning Research
Natural actor-critic

ECML'05 Proceedings of the 16th European conference on Machine Learning
Neural fitted q iteration – first experiences with a data efficient neural reinforcement learning method

ECML'05 Proceedings of the 16th European conference on Machine Learning

Teaching a robot to perform tasks with voice commands

MICAI'10 Proceedings of the 9th Mexican international conference on Advances in artificial intelligence: Part I
Reinforcement learning with a bilinear q function

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Policy iteration based on a learned transition model

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we address reinforcement learning problems with continuous state-action spaces. We propose a new algorithm, fitted natural actor-critic(FNAC), that extends the work in [1] to allow for general function approximation and data reuse. We combine the natural actor-critic architecture [1] with a variant of fitted value iteration using importance sampling. The method thus obtained combines the appealing features of both approaches while overcoming their main weaknesses: the use of a gradient-based actor readily overcomes the difficulties found in regression methods with policy optimization in continuous action-spaces; in turn, the use of a regression-based critic allows for efficient use of data and avoids convergence problems that TD-based critics often exhibit. We establish the convergence of our algorithm and illustrate its application in a simple continuous space, continuous action problem.