Improving Gaussian process value function approximation in policy gradient algorithms

  • Authors:
  • Hunor Jakab;Lehel Csató

  • Affiliations:
  • Babeş-Bolyai University, Cluj-Napoca, Romania and Eötvös Loránd University, Budapest, Hungary;Babeş-Bolyai University, Cluj-Napoca, Romania and Eötvös Loránd University, Budapest, Hungary

  • Venue:
  • ICANN'11 Proceedings of the 21st international conference on Artificial neural networks - Volume Part II
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The use of value-function approximation in reinforcement learning (RL) problems is widely studied, the most common application of it being the extension of value-based RL methods to continuous domains. Gradient-based policy search algorithms can also benefit from the availability of an estimated value-function, as this estimation can be used for gradient variance reduction. In this article we present a new value function approximation method that uses a modified version of the Kullback-Leibler (KL) distance based sparse on-line Gaussian process regression. We combine it with Williams' episodic REINFORCE algorithm to reduce the variance of the gradient estimates. A significant computational overload of the algorithm is caused by the need to completely re-estimate the value-function after each gradient update step. To overcome this problem we propose a measure composed of a KL distance-based score and a time dependent factor to exchange obsolete basis vectors with newly acquired measurements. This method leads to a more stable estimation of the action value-function and also reduces gradient variance. Performance and convergence comparisons are provided for the described algorithm, testing it on a dynamic system control problem with continuous state-action space.