Dyna, an integrated architecture for learning, planning, and reacting
ACM SIGART Bulletin
Reinforcement learning with replacing eligibility traces
Machine Learning - Special issue on reinforcement learning
Learning to Predict by the Methods of Temporal Differences
Machine Learning
On the convergence of stochastic iterative dynamic programming algorithms
Neural Computation
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
Eligibility traces have been shown to substantially improve the convergence speed of temporal difference learning algorithms, by maintaining a record of recently experienced states. This paper presents an extension of conventional eligibility traces (compiled traces) which retain additional information about the agent's experience within the environment. Empirical results show that compiled traces outperform conventional traces when applied to policy evaluation tasks using a tabular representation of the state values.