Recursive least-squares learning with eligibility traces

  • Authors:
  • Bruno Scherrer;Matthieu Geist

  • Affiliations:
  • INRIA, MAIA Project-Team, Nancy, France;Supélec, IMS Research Group, Metz, France

  • Venue:
  • EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the framework of Markov Decision Processes, we consider the problem of learning a linear approximation of the value function of some fixed policy from one trajectory possibly generated by some other policy. We describe a systematic approach for adapting on-policy learning least squares algorithms of the literature (LSTD [5], LSPE [15], FPKF [7] and GPTD [8]/KTD [10]) to off-policy learning with eligibility traces . This leads to two known algorithms, LSTD(λ )/LSPE(λ ) [21] and suggests new extensions of FPKF and GPTD/KTD. We describe their recursive implementation, discuss their convergence properties, and illustrate their behavior experimentally. Overall, our study suggests that the state-of-art LSTD(λ ) [21] remains the best least-squares algorithm.