From Q(λ) to average Q-learning: efficient implementation of an asymptotic approximation

  • Authors:
  • Frédérick Garcia;Florent Serre

  • Affiliations:
  • INRA, Unité de Biométrie et Intelligence Artificielle, Castanet Tolosan cedex, France;INRA, Unité de Biométrie et Intelligence Artificielle, Castanet Tolosan cedex, France

  • Venue:
  • IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Q(λ) is a reinforcement learning algorithm that combines Q-learning and TD(λ). Online implementations of Q(λ) that use eligibility traces have been shown to speed basic Q-learning. In this paper we present an asymptotic analysis of Watkins' Q(λ) with accumulative eligibility traces. We first introduce an asymptotic approximation of Q(λ) that appears to be a gain matrix variant of basic Q-learning. Using the ODE method, we then determine an optimal gain matrix for Q-learning that maximizes its rate of convergence toward the optimal value function Q*. The similarity between this optimal gain and the asymptotic gain of Q(λ) explains the relative efficiency of the latter for (λ) 0. Furthermore, by minimizing the difference between these two gains, optimal values for the λ parameter and the decreasing learning rates can be determined. This optimal λ strongly depends on the exploration policy during learning. A robust approximation of these learning parameters leads to the definition of a new efficient algorithm called AQ-learning (Average Q-learning), that shows a close resemblance to Schwartz' R-learning. Our results have been demonstrated through numerical simulations.