Basis Expansion in Natural Actor Critic Methods

Authors:
Sertan Girgin;Philippe Preux
Affiliations:
Team-Project SequeL, INRIA Lille Nord-Europe,;Team-Project SequeL, INRIA Lille Nord-Europe, and LIFL (UMR CNRS), Université de Lille,
Venue:
Recent Advances in Reinforcement Learning
Year:
2008

Citing 12
Cited 1

The cascade-correlation learning architecture

Advances in neural information processing systems 2
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Machine Learning
Natural gradient works efficiently in learning

Neural Computation
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
On Actor-Critic Algorithms

SIAM Journal on Control and Optimization
An introduction to variable and feature selection

The Journal of Machine Learning Research
Automatic basis function construction for approximate dynamic programming and reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Constructing basis functions from directed graphs for value function approximation

Proceedings of the 24th international conference on Machine learning
Analyzing feature generation for value-function approximation

Proceedings of the 24th international conference on Machine learning
Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes

The Journal of Machine Learning Research
Natural Actor-Critic

Neurocomputing

Adaptive bases for reinforcement learning

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

In reinforcement learning, the aim of the agent is to find a policy that maximizes its expected return. Policy gradient methods try to accomplish this goal by directly approximating the policy using a parametric function approximator; the expected return of the current policy is estimated and its parameters are updated by steepest ascent in the direction of the gradient of the expected return with respect to the policy parameters. In general, the policy is defined in terms of a set of basis functions that capture important features of the problem. Since the quality of the resulting policies directly depend on the set of basis functions, and defining them gets harder as the complexity of the problem increases, it is important to be able to find them automatically. In this paper, we propose a new approach which uses cascade-correlation learning architecture for automatically constructing a set of basis functions within the context of Natural Actor-Critic (NAC) algorithms. Such basis functions allow more complex policies be represented, and consequently improve the performance of the resulting policies. We also present the effectiveness of the method empirically.