Experimental analysis of eligibility traces strategies in temporal difference learning

  • Authors:
  • Jinsong Leng;Lakhmi Jain;Colin Fyfe

  • Affiliations:
  • School of Electrical and Information Engineering, Knowledge Based Intelligent Engineering Systems Centre, University of South Australia, Mawson Lakes SA 5095, Australia.;School of Electrical and Information Engineering, Knowledge Based Intelligent Engineering Systems Centre, University of South Australia, Mawson Lakes SA 5095, Australia.;Applied Computational Intelligence Research Unit, University of the West of Scotland, 1 Westerfield, High Calside, PA2 6BY, Paisley, Scotland

  • Venue:
  • International Journal of Knowledge Engineering and Soft Data Paradigms
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Temporal difference (TD) learning is a model-free reinforcementlearning technique, which adopts an infinite horizon discount modeland uses an incremental learning technique for dynamic programming.The state value function is updated in terms of sample episodes.Utilising eligibility traces is a key mechanism in enhancing therate of convergence. TD(λ) represents the use of eligibilitytraces by introducing the parameter λ. However, theunderlying mechanism of eligibility traces with an approximationfunction has not been well understood, either from theoreticalpoint of view or from practical point of view. The TD(λ)method has been proved to be convergent with local tabular staterepresentation. Unfortunately, proving convergence of TD(λ)with function approximation is still an important open theoreticalquestion. This paper aims to investigate the convergence and theeffects of different eligibility traces. In this paper, we adoptSarsa(λ) learning control algorithm with a large, stochasticand dynamic simulation environment called SoccerBots. The statevalue function is represented by a linear approximation functionknown as tile coding. The performance metrics generated from thesimulation system can be used to analyse the mechanism ofeligibility traces.