Discrete-time, Discrete-valued Observable Operator Models: a Tutorial
Discrete-time, Discrete-valued Observable Operator Models: a Tutorial
Learning low dimensional predictive representations
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Predictive state representations: a new theory for modeling dynamical systems
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
TD(λ) networks: temporal-difference networks with eligibility traces
ICML '05 Proceedings of the 22nd international conference on Machine learning
On-line discovery of temporal-difference networks
Proceedings of the 25th international conference on Machine learning
Epoch-Incremental Queue-Dyna Algorithm
ICAISC '08 Proceedings of the 9th international conference on Artificial Intelligence and Soft Computing
Proto-predictive representation of states with simple recurrent temporal-difference networks
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Hi-index | 0.00 |
Temporal-difference (TD) networks are a formalism for expressing and learning grounded world knowledge in a predictive form [Sutton and Tanner, 2005]. However, not all partially observable Markov decision processes can be efficiently learned with TD networks. In this paper, we extend TD networks by allowing the network-update process (answer network) to depend on the recent history of previous actions and observations rather than only on the most recent action and observation. We show that this extension enables the solution of a larger class of problems than can be solved by the original TD networks or by history-based methods alone. In addition, we apply TD networks to a problem that, while still simple, is significantly larger than has previously been considered. We show that history-extended TD networks can learn much of the common-sense knowledge of an egocentric gridworld domain with a single bit of perception.