Improving generalization for temporal difference learning: The successor representation

Authors:
Peter Dayan
Affiliations:
Computational Neurobiology Laboratory, The Salk Institute, P.O. Box 85800, San Diego, CA 92186-5800 USA
Venue:
Neural Computation
Year:
1993

Citing 9
Cited 3

Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

Proceedings of the seventh international conference (1990) on Machine learning
Navigating through temporal difference

NIPS-3 Proceedings of the 1990 conference on Advances in neural information processing systems 3
Technical Note: \cal Q-Learning

Machine Learning
The Convergence of TD(λ) for General λ

Machine Learning
Some studies in machine learning using the game of checkers

Computers & thought
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Learning and Sequential Decision Making

Learning and Sequential Decision Making
Temporal credit assignment in reinforcement learning

Temporal credit assignment in reinforcement learning
Learning and problem-solving with multilayer connectionist systems (adaptive, strategy learning, neural networks, reinforcement learning)

Learning and problem-solving with multilayer connectionist systems (adaptive, strategy learning, neural networks, reinforcement learning)

2009 Special Issue: Goal-directed control and its antipodes

Neural Networks
Learning Representation and Control in Markov Decision Processes: New Frontiers

Foundations and Trends® in Machine Learning
The successor representation and temporal context

Neural Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Estimation of returns over time, the focus of temporal difference (TD) algorithms, imposes particular constraints on good function approximators or representations. Appropriate generalization between states is determined by how similar their successors are, and representations should follow suit. This paper shows how TD machinery can be used to learn such representations, and illustrates, using a navigation task, the appropriately distributed nature of the result.