Restricted gradient-descent algorithm for value-function approximation in reinforcement learning

  • Authors:
  • André da Motta Salles Barreto;Charles W. Anderson

  • Affiliations:
  • Programa de Engenharia Civil/COPPE, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ, Brazil;Department of Computer Science, Colorado State University, Fort Collins, CO 80523, USA

  • Venue:
  • Artificial Intelligence
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This work presents the restricted gradient-descent (RGD) algorithm, a training method for local radial-basis function networks specifically developed to be used in the context of reinforcement learning. The RGD algorithm can be seen as a way to extract relevant features from the state space to feed a linear model computing an approximation of the value function. Its basic idea is to restrict the way the standard gradient-descent algorithm changes the hidden units of the approximator, which results in conservative modifications that make the learning process less prone to divergence. The algorithm is also able to configure the topology of the network, an important characteristic in the context of reinforcement learning, where the changing policy may result in different requirements on the approximator structure. Computational experiments are presented showing that the RGD algorithm consistently generates better value-function approximations than the standard gradient-descent method, and that the latter is more susceptible to divergence. In the pole-balancing and Acrobot tasks, RGD combined with SARSA presents competitive results with other methods found in the literature, including evolutionary and recent reinforcement-learning algorithms.