Practical Issues in Temporal Difference Learning

Authors:
Gerald Tesauro
Affiliations:
IBM Thomas J. Watson Research Center, P.O. Box 704, Yorktown Heights, NY 10598 USA
Venue:
Machine Learning
Year:
1992

Citing 0
Cited 109

Temporal difference learning and TD-Gammon

Communications of the ACM
Representing Probabilistic Rules with Networks of GaussianBasis Functions

Machine Learning
A Teaching Strategy for Memory-Based Control

Artificial Intelligence Review - Special issue on lazy learning
Mean-field theory for batched TD (&lgr;)

Neural Computation
Explanation-Based Learning and Reinforcement Learning: A Unified View

Machine Learning
Co-Evolution in the Successful Learning of Backgammon Strategy

Machine Learning
Comments on “Co-Evolution in the Successful Learning of Backgammon Strategy”

Machine Learning
Colearning in Differential Games

Machine Learning
Elevator Group Control Using Multiple Reinforcement Learning Agents

Machine Learning
Toward a Model of Intelligence as an Economy of Agents

Machine Learning
Learning to Play Chess Using Temporal Differences

Machine Learning
Computer Go: an AI oriented survey

Artificial Intelligence
Knowledge extraction from reinforcement learning

New learning paradigms in soft computing
Reinforcement learning for fuzzy agents: application to a pighouse environment control

New learning paradigms in soft computing
Games, computers and artificial intelligence

Artificial Intelligence - Chips challenging champions: games, computers and Artificial Intelligence
Programming backgammon using self-teaching neural nets

Artificial Intelligence - Chips challenging champions: games, computers and Artificial Intelligence
The Lagging Anchor Algorithm: Reinforcement Learning in Two-Player Zero-Sum Games with Imperfect Information

Machine Learning
A Hybrid Architecture for Situated Learning of Reactive Sequential Decision Making

Applied Intelligence
Embedding a Priori Knowledge in Reinforcement Learning

Journal of Intelligent and Robotic Systems
Reinforcement Learning Agents

Artificial Intelligence Review
Recent Advances in Hierarchical Reinforcement Learning

Discrete Event Dynamic Systems
A Framework for Learning in Search-Based Systems

IEEE Transactions on Knowledge and Data Engineering
Many-layered learning

Neural Computation
Learning to play strong poker

Machines that learn to play games
Minimax TD-Learning with Neural Nets in a Markov Game

ECML '00 Proceedings of the 11th European Conference on Machine Learning
A Reinforcement Learning Algorithm Applied to Simplified Two-Player Texas Hold'em Poker

EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Learning While Exploring: Bridging the Gaps in the Eligibility Traces

EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Unsupervised Learning in Metagame

AI '99 Proceedings of the 12th Australian Joint Conference on Artificial Intelligence: Advanced Topics in Artificial Intelligence
Introduction to Sequence Learning

Sequence Learning - Paradigms, Algorithms, and Applications
On the Need for a Neural Abstract Machine

Sequence Learning - Paradigms, Algorithms, and Applications
Distributed Decision Making in Checkers

CG '98 Proceedings of the First International Conference on Computers and Games
Learning Time Allocation Using Neural Networks

CG '00 Revised Papers from the Second International Conference on Computers and Games
Chess Neighborhoods, Function Combination, and Reinforcement Learning

CG '00 Revised Papers from the Second International Conference on Computers and Games
Spatiotemporal Abstraction of Stochastic Sequential Processes

Proceedings of the 5th International Symposium on Abstraction, Reformulation and Approximation
Applications of the self-organising map to reinforcement learning

Neural Networks - New developments in self-organizing maps
Beyond Samuel: evolving a nearly expert checkers player

Advances in evolutionary computing
Recent Advances in Hierarchical Reinforcement Learning

Discrete Event Dynamic Systems
Learning evaluation functions to improve optimization by local search

The Journal of Machine Learning Research
Distributed Reinforcement Learning Control for Batch Sequencing and Sizing in Just-In-Time Manufacturing Systems

Applied Intelligence
Transfer of Experience Between Reinforcement Learning Environments with Progressive Difficulty

Artificial Intelligence Review
A multi-agent system integrating reinforcement learning, bidding and genetic algorithms

Web Intelligence and Agent Systems
Unifying Temporal and Structural Credit Assignment Problems

AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 2
System for foreign exchange trading using genetic algorithms and reinforcement learning

International Journal of Systems Science
Implementing Temporal-Difference Learning with the Scaled Conjugate Gradient Algorithm

Neural Processing Letters
A Reinforcement Learning Algorithm in Cooperative Multi-Robot Domains

Journal of Intelligent and Robotic Systems
Temporal Sequence Learning, Prediction, and Control: A Review of Different Models and Their Relation to Biological Mechanisms

Neural Computation
Robust Reinforcement Learning

Neural Computation
How Online Learning Approaches Ornstein Uhlenbeck Processes

Neural Processing Letters
Declarative Optimization-Based Drama Management in Interactive Fiction

IEEE Computer Graphics and Applications
Learning long-term chess strategies from databases

Machine Learning
Universal parameter optimisation in games based on SPSA

Machine Learning
Reinforcement learning for declarative optimization-based drama management

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Allocating time and location information to activity-travel patterns through reinforcement learning

Knowledge-Based Systems
Learning and Cooperation in Sequential Games

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Application of SONQL for real-time learning of robot behaviors

Robotics and Autonomous Systems
A globally optimal algorithm for TTD-MDPs

Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Representational power of restricted boltzmann machines and deep belief networks

Neural Computation
Cooperation learning in Multi-Agent Systems with annotation and reward

International Journal of Knowledge-based and Intelligent Engineering Systems
Genetic algorithms for mentor-assisted evaluation function optimization

Proceedings of the 10th annual conference on Genetic and evolutionary computation
Accelerated Neural Evolution through Cooperatively Coevolved Synapses

The Journal of Machine Learning Research
Learning How to Play Hex

KI '07 Proceedings of the 30th annual German conference on Advances in Artificial Intelligence
Mixture of Expert Used to Learn Game Play

ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
Robustness Analysis of SARSA(λ): Different Models of Reward and Initialisation

AIMSA '08 Proceedings of the 13th international conference on Artificial Intelligence: Methodology, Systems, and Applications
A reinforcement learning model for supply chain ordering management: An application to the beer game

Decision Support Systems
Player Co-Modelling in a Strategy Board Game: Discovering How to Play Fast

Cybernetics and Systems
Reinforcement Learning: A Tutorial Survey and Recent Advances

INFORMS Journal on Computing
Expertise and Intuition: A Tale of Three Theories

Minds and Machines
Simulating human grandmasters: evolution and coevolution of evaluation functions

Proceedings of the 11th Annual conference on Genetic and evolutionary computation
An application of reinforcement learning to dialogue strategy selection in a spoken dialogue system for email

Journal of Artificial Intelligence Research
Approximate policy iteration with a policy language bias: solving relational Markov decision processes

Journal of Artificial Intelligence Research
Solving factored MDPs with hybrid state and action variables

Journal of Artificial Intelligence Research
Statistical feature combination for the evaluation of game positions

Journal of Artificial Intelligence Research
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Truncating temporal differences: on the efficient implementation of TD (λ) for reinforcement learning

Journal of Artificial Intelligence Research
Infinite-horizon policy-gradient estimation

Journal of Artificial Intelligence Research
Experiments with infinite-horizon, policy-gradient estimation

Journal of Artificial Intelligence Research
Temporal coherence and prediction decay in TD learning

IJCAI'99 Proceedings of the 16th international joint conference on Artifical intelligence - Volume 1
A reinforcement learning approach to job-shop scheduling

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Learning to act using real-time dynamic programming

Artificial Intelligence
Reinforcement learning and adaptive dynamic programming for feedback control

IEEE Circuits and Systems Magazine
Goal-directed feature learning

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Adaptive state space partitioning for reinforcement learning

Engineering Applications of Artificial Intelligence
Switching between different state representations in reinforcement learning

AIA '08 Proceedings of the 26th IASTED International Conference on Artificial Intelligence and Applications
Applying artificial neural network combined with TD (λ) to computer Chinese chess

CCDC'09 Proceedings of the 21st annual international conference on Chinese control and decision conference
Introducing a round robin tournament into Blondie24

CIG'09 Proceedings of the 5th international conference on Computational Intelligence and Games
Adaptive dynamic programming: an introduction

IEEE Computational Intelligence Magazine
Probabilistic Policy Reuse for inter-task transfer learning

Robotics and Autonomous Systems
Reinforcement learning of competitive and cooperative skills in soccer agents

Applied Soft Computing
Evolving Static Representations for Task Transfer

The Journal of Machine Learning Research
A reinforcement learning framework for combinatorial optimization

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
Reinforcement learning with a hierarchy of abstract models

AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
Learning to play hearts

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Expert-driven genetic algorithms for simulating evaluation functions

Genetic Programming and Evolvable Machines
Continuous state/action reinforcement learning: A growing self-organizing map approach

Neurocomputing
Training neural networks to play backgammon variants using reinforcement learning

EvoApplications'11 Proceedings of the 2011 international conference on Applications of evolutionary computation - Volume Part I
A methodology for learning players| styles from game records

International Journal of Artificial Intelligence and Soft Computing
Self-teaching adaptive dynamic programming for Gomoku

Neurocomputing
Probabilistic exploration in planning while learning

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
An investigation of TREPAN utilising a continuous oracle model

International Journal of Data Analysis Techniques and Strategies
Neural fitted q iteration – first experiences with a data efficient neural reinforcement learning method

ECML'05 Proceedings of the 16th European conference on Machine Learning
RSPSA: enhanced parameter optimization in games

ACG'05 Proceedings of the 11th international conference on Advances in Computer Games
Self-Organizing reinforcement learning model

ACIIDS'12 Proceedings of the 4th Asian conference on Intelligent Information and Database Systems - Volume Part I
Dyna-H: A heuristic planning reinforcement learning algorithm applied to role-playing game strategy decision systems

Knowledge-Based Systems
Forecasting of short-term traffic-flow based on improved neurofuzzy models via emotional temporal difference learning algorithm

Engineering Applications of Artificial Intelligence
Fast reinforcement learning with large action sets using error-correcting output codes for MDP factorization

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Reinforcement Learning with Reward Shaping and Mixed Resolution Function Approximation

International Journal of Agent Technologies and Systems
Generating artificial neural networks for value function approximation in a domain requiring a shifting strategy

EvoApplications'13 Proceedings of the 16th European conference on Applications of Evolutionary Computation
Monte-Carlo tree search for Bayesian reinforcement learning

Applied Intelligence
Learning via human feedback in continuous state and action spaces

Applied Intelligence

Quantified Score

Hi-index	0.02

Visualization

Abstract

This paper examines whether temporal difference methods for training connectionist networks, such as Sutton's TD(λ) algorithm, can be successfully applied to complex real-world problems. A number of important practical issues are identified and discussed from a general theoretical perspective. These practical issues are then examined in the context of a case study in which TD(λ) is applied to learning the game of backgammon from the outcome of self-play. This is apparently the first application of this algorithm to a complex non-trivial task. It is found that, with zero knowledge built in, the network is able to learn from scratch to play the entire game at a fairly strong intermediate level of performance, which is clearly better than conventional commercial programs, and which in fact surpasses comparable networks trained on a massive human expert data set. This indicates that TD learning may work better in practice than one would expect based on current theory, and it suggests that further analysis of TD methods, as well as applications in other complex domains, may be worth investigating.