Experimental analysis on Sarsa(λ) and Q(λ) with different eligibility traces strategies

  • Authors:
  • Jinsong Leng;Colin Fyfe;Lakhmi C. Jain

  • Affiliations:
  • (Correspd. E-mail: Jinsong.Leng@unisa.edu.au) School of Electrical and Information Engineering, Knowledge Based Intelligent Engineering Systems Centre, University of South Australia, Mawson Lakes ...;Applied Computational Intelligence Research Unit, University of the West of Scotland, Paisley, Scotland;School of Electrical and Information Engineering, Knowledge Based Intelligent Engineering Systems Centre, University of South Australia, Mawson Lakes SA 5095, Australia

  • Venue:
  • Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology - Theoretical advances of intelligent paradigms
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Temporal difference learning and eligibility traces are two mechanisms for solving reinforcement learning problems. The temporal difference technique bootstraps the state value or state-action value at every step as with dynamic programming, and learns by sampling episodes from experience as in the Monte Carlo approach. Eligibility traces is a mechanism that offers a means for recording the degree of which state is eligible for undergoing learning process. This paper aims to investigate the underlying mechanism of eligibility traces strategies using on-policy and off-policy learning algorithms. In doing so, the performance metrics can be obtained by defining the learning problem in a simulation environment, in conjunction with different learning algorithms. However, measuring learning performance and analysing sensibility are very expensive because such performance metrics can only be obtained by running an experiment with different parameter values. This paper proposes a comparative study for analysing the mechanism of eligibility traces. The objective of this paper is to compare and investigate the influences on performance caused by those different approaches.