Experimental analysis of eligibility traces strategies in temporal difference learning

Authors:
Jinsong Leng;Lakhmi Jain;Colin Fyfe
Affiliations:
School of Electrical and Information Engineering, Knowledge Based Intelligent Engineering Systems Centre, University of South Australia, Mawson Lakes SA 5095, Australia.;School of Electrical and Information Engineering, Knowledge Based Intelligent Engineering Systems Centre, University of South Australia, Mawson Lakes SA 5095, Australia.;Applied Computational Intelligence Research Unit, University of the West of Scotland, 1 Westerfield, High Calside, PA2 6BY, Paisley, Scotland
Venue:
International Journal of Knowledge Engineering and Soft Data Paradigms
Year:
2009

Citing 18
Cited 0

Technical Note: \cal Q-Learning

Machine Learning
The Convergence of TD(λ) for General λ

Machine Learning
TD(λ) Converges with Probability 1

Machine Learning
Reinforcement learning with replacing eligibility traces

Machine Learning - Special issue on reinforcement learning
Reinforcement Learning

Reinforcement Learning
Simulation and the Monte Carlo Method

Simulation and the Monte Carlo Method
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Scaling Reinforcement Learning toward RoboCup Soccer

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Dynamic Programming

Dynamic Programming
Temporal credit assignment in reinforcement learning

Temporal credit assignment in reinforcement learning
Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning

Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning
Evolutionary Function Approximation for Reinforcement Learning

The Journal of Machine Learning Research
Simulation and reinforcement learning with soccer agents

Multiagent and Grid Systems - Innovations in intelligent agent technology
Experimental analysis on Sarsa(λ) and Q(λ) with different eligibility traces strategies

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology - Theoretical advances of intelligent paradigms
Reinforcement learning of competitive skills with soccer agents

KES'07/WIRN'07 Proceedings of the 11th international conference, KES 2007 and XVII Italian workshop on neural networks conference on Knowledge-based intelligent information and engineering systems: Part I
Convergence analysis on approximate reinforcement learning

KSEM'07 Proceedings of the 2nd international conference on Knowledge science, engineering and management
Keepaway soccer: from machine learning testbed to benchmark

RoboCup 2005
Teamwork and simulation in hybrid cognitive architecture

KES'06 Proceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Temporal difference (TD) learning is a model-free reinforcementlearning technique, which adopts an infinite horizon discount modeland uses an incremental learning technique for dynamic programming.The state value function is updated in terms of sample episodes.Utilising eligibility traces is a key mechanism in enhancing therate of convergence. TD(λ) represents the use of eligibilitytraces by introducing the parameter λ. However, theunderlying mechanism of eligibility traces with an approximationfunction has not been well understood, either from theoreticalpoint of view or from practical point of view. The TD(λ)method has been proved to be convergent with local tabular staterepresentation. Unfortunately, proving convergence of TD(λ)with function approximation is still an important open theoreticalquestion. This paper aims to investigate the convergence and theeffects of different eligibility traces. In this paper, we adoptSarsa(λ) learning control algorithm with a large, stochasticand dynamic simulation environment called SoccerBots. The statevalue function is represented by a linear approximation functionknown as tile coding. The performance metrics generated from thesimulation system can be used to analyse the mechanism ofeligibility traces.