Convergence analysis on approximate reinforcement learning

Authors:
Jinsong Leng;Lakhmi Jain;Colin Fyfe
Affiliations:
School of Electrical and Information Engineering, Knowledge Based Intelligent Engineering Systems Centre, University of South Australia, Mawson Lakes, SA, Australia;School of Electrical and Information Engineering, Knowledge Based Intelligent Engineering Systems Centre, University of South Australia, Mawson Lakes, SA, Australia;Applied Computational Intelligence Research Unit, The University of Paisley, Scotland
Venue:
KSEM'07 Proceedings of the 2nd international conference on Knowledge science, engineering and management
Year:
2007

Citing 4
Cited 2

TD(λ) Converges with Probability 1

Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Simulation and reinforcement learning with soccer agents

Multiagent and Grid Systems - Innovations in intelligent agent technology

Experimental analysis of eligibility traces strategies in temporal difference learning

International Journal of Knowledge Engineering and Soft Data Paradigms
Temporal difference learning and simulated annealing for optimal control: a case study

KES-AMSTA'08 Proceedings of the 2nd KES International conference on Agent and multi-agent systems: technologies and applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Temporal difference (TD) learning is a form of approximate reinforcement learning using an incremental learning updates. For large, stochastic and dynamic systems, however, it is still on open question for lacking the methodology to analyse the convergence and sensitivity of TD algorithms. Meanwhile, analysis on convergence and sensitivity of parameters are very expensive, such analysis metrics are obtained only by running an experiment with different parameter values. In this paper, we utilise the TD(λ) learning control algorithm with a linear function approximation technique known as tile coding in order to help soccer agent learn the optimal control processes. The aim of this paper is to propose a methodology for analysing the performance for adaptively selecting a set of optimal parameter values in TD(λ) learning algorithm.