Enhanced temporal difference learning using compiled eligibility traces

Authors:
Peter Vamplew;Robert Ollington;Mark Hepburn
Affiliations:
School of Information Technology and Mathematical Sciences, University of Ballarat, Ballarat, Victoria, Australia;School of Computing, University of Tasmania, Hobart, Tasmania, Australia;School of Computing, University of Tasmania, Hobart, Tasmania, Australia
Venue:
AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence
Year:
2006

Citing 5
Cited 0

Dyna, an integrated architecture for learning, planning, and reacting

ACM SIGART Bulletin
Reinforcement learning with replacing eligibility traces

Machine Learning - Special issue on reinforcement learning
Learning to Predict by the Methods of Temporal Differences

Machine Learning
On the convergence of stochastic iterative dynamic programming algorithms

Neural Computation
Truncating temporal differences: on the efficient implementation of TD (λ) for reinforcement learning

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Eligibility traces have been shown to substantially improve the convergence speed of temporal difference learning algorithms, by maintaining a record of recently experienced states. This paper presents an extension of conventional eligibility traces (compiled traces) which retain additional information about the agent's experience within the environment. Empirical results show that compiled traces outperform conventional traces when applied to policy evaluation tasks using a tabular representation of the state values.