Postponed Updates for Temporal-Difference Reinforcement Learning

Authors:
Harm van Seijen;Shimon Whiteson
Affiliations:
-;-
Venue:
ISDA '09 Proceedings of the 2009 Ninth International Conference on Intelligent Systems Design and Applications
Year:
2009

Citing 0
Cited 2

Efficient learning in linearly solvable MDP models

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Q-learning Reward Propagation Method for Reducing the Transmission Power of Sensor Nodes in Wireless Sensor Networks

Wireless Personal Communications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents postponed updates, a new strategy for TD methods that can improve sample efficiency without incurring the computational and space requirements of model-based RL. By recording the agent's last-visit experience, the agent can delay its update until the given state is revisited, thereby improving the quality of the update. Experimental results demonstrate that postponed updates outperforms several competitors, most notably eligibility traces, a traditional way to improve the sample efficiency of TD methods. It achieves this without the need to tune an extra parameter as is needed for eligibility traces.