Temporal-difference networks with history

Authors:
Brian Tanner;Richard S. Sutton
Affiliations:
University of Alberta, Reinforcement Learning and Artificial Intelligence Laboratory, Edmonton, Alberta, Canada;University of Alberta, Reinforcement Learning and Artificial Intelligence Laboratory, Edmonton, Alberta, Canada
Venue:
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Year:
2005

Citing 3
Cited 4

Discrete-time, Discrete-valued Observable Operator Models: a Tutorial

Discrete-time, Discrete-valued Observable Operator Models: a Tutorial
Learning low dimensional predictive representations

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Predictive state representations: a new theory for modeling dynamical systems

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence

TD(λ) networks: temporal-difference networks with eligibility traces

ICML '05 Proceedings of the 22nd international conference on Machine learning
On-line discovery of temporal-difference networks

Proceedings of the 25th international conference on Machine learning
Epoch-Incremental Queue-Dyna Algorithm

ICAISC '08 Proceedings of the 9th international conference on Artificial Intelligence and Soft Computing
Proto-predictive representation of states with simple recurrent temporal-difference networks

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Temporal-difference (TD) networks are a formalism for expressing and learning grounded world knowledge in a predictive form [Sutton and Tanner, 2005]. However, not all partially observable Markov decision processes can be efficiently learned with TD networks. In this paper, we extend TD networks by allowing the network-update process (answer network) to depend on the recent history of previous actions and observations rather than only on the most recent action and observation. We show that this extension enables the solution of a larger class of problems than can be solved by the original TD networks or by history-based methods alone. In addition, we apply TD networks to a problem that, while still simple, is significantly larger than has previously been considered. We show that history-extended TD networks can learn much of the common-sense knowledge of an egocentric gridworld domain with a single bit of perception.