Technical Note: \cal Q-Learning
Machine Learning
The Convergence of TD(λ) for General λ
Machine Learning
TD(λ) Converges with Probability 1
Machine Learning
Reinforcement learning with replacing eligibility traces
Machine Learning - Special issue on reinforcement learning
Reinforcement Learning
Simulation and the Monte Carlo Method
Simulation and the Monte Carlo Method
Learning to Predict by the Methods of Temporal Differences
Machine Learning
Scaling Reinforcement Learning toward RoboCup Soccer
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Dynamic Programming
Temporal credit assignment in reinforcement learning
Temporal credit assignment in reinforcement learning
Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning
Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning
Evolutionary Function Approximation for Reinforcement Learning
The Journal of Machine Learning Research
Simulation and reinforcement learning with soccer agents
Multiagent and Grid Systems - Innovations in intelligent agent technology
Experimental analysis on Sarsa(λ) and Q(λ) with different eligibility traces strategies
Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology - Theoretical advances of intelligent paradigms
Reinforcement learning of competitive skills with soccer agents
KES'07/WIRN'07 Proceedings of the 11th international conference, KES 2007 and XVII Italian workshop on neural networks conference on Knowledge-based intelligent information and engineering systems: Part I
Convergence analysis on approximate reinforcement learning
KSEM'07 Proceedings of the 2nd international conference on Knowledge science, engineering and management
Teamwork and simulation in hybrid cognitive architecture
KES'06 Proceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part II
Hi-index | 0.00 |
Temporal difference (TD) learning is a model-free reinforcementlearning technique, which adopts an infinite horizon discount modeland uses an incremental learning technique for dynamic programming.The state value function is updated in terms of sample episodes.Utilising eligibility traces is a key mechanism in enhancing therate of convergence. TD(λ) represents the use of eligibilitytraces by introducing the parameter λ. However, theunderlying mechanism of eligibility traces with an approximationfunction has not been well understood, either from theoreticalpoint of view or from practical point of view. The TD(λ)method has been proved to be convergent with local tabular staterepresentation. Unfortunately, proving convergence of TD(λ)with function approximation is still an important open theoreticalquestion. This paper aims to investigate the convergence and theeffects of different eligibility traces. In this paper, we adoptSarsa(λ) learning control algorithm with a large, stochasticand dynamic simulation environment called SoccerBots. The statevalue function is represented by a linear approximation functionknown as tile coding. The performance metrics generated from thesimulation system can be used to analyse the mechanism ofeligibility traces.