Temporal difference learning and TD-Gammon

Authors:
Gerald Tesauro
Affiliations:
IBM Thomas J. Watson Research Center, P.O. Box 704, Yorktown Heights, NY
Venue:
Communications of the ACM
Year:
1995

Citing 7
Cited 179

Multilayer feedforward networks are universal approximators

Neural Networks
Learning internal representations by error propagation

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
The cascade-correlation learning architecture

Advances in neural information processing systems 2
Practical Issues in Temporal Difference Learning

Machine Learning
Automatic feature generation for problem solving systems

ML92 Proceedings of the ninth international workshop on Machine learning
Toward an Ideal Trainer

Machine Learning
Learning to Predict by the Methods of Temporal Differences

Machine Learning

A competitive approach to game learning

COLT '96 Proceedings of the ninth annual conference on Computational learning theory
Strategic directions in artificial intelligence

ACM Computing Surveys (CSUR) - Special ACM 50th-anniversary issue: strategic directions in computing research
Mean-field theory for batched TD (&lgr;)

Neural Computation
Co-Evolution in the Successful Learning of Backgammon Strategy

Machine Learning
Colearning in Differential Games

Machine Learning
Elevator Group Control Using Multiple Reinforcement Learning Agents

Machine Learning
Learning to Take Actions

Machine Learning
Using probabilistic knowledge and simulation to play poker

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Game playing (invited talk): the next moves

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Toward a Model of Intelligence as an Economy of Agents

Machine Learning
Convergence Results for Single-Step On-PolicyReinforcement-Learning Algorithms

Machine Learning
On verifying game designs and playing strategies using reinforcement learning

Proceedings of the 2001 ACM symposium on Applied computing
Computer Go: an AI oriented survey

Artificial Intelligence
Improving heurisitic mini-max search by supervised learning

Artificial Intelligence - Chips challenging champions: games, computers and Artificial Intelligence
Programming backgammon using self-teaching neural nets

Artificial Intelligence - Chips challenging champions: games, computers and Artificial Intelligence
The challenge of poker

Artificial Intelligence - Chips challenging champions: games, computers and Artificial Intelligence
World-championship-caliber Scrabble

Artificial Intelligence - Chips challenging champions: games, computers and Artificial Intelligence
Experience generalization for concurrent reinforcement learners: the minimax-QS algorithm

Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 3
Machine learning and inductive logic programming for multi-agent systems

Mutli-agents systems and applications
Relational reinforcement learning

Mutli-agents systems and applications
Reinforced Genetic Programming

Genetic Programming and Evolvable Machines
Heuristics in Programming of Nondeterministic Games

Programming and Computing Software
Relational Reinforcement Learning

Machine Learning
Introduction

Machine Learning
Variable Resolution Discretization in Optimal Control

Machine Learning
Reinforcement Learning Agents

Artificial Intelligence Review
Actor-critic models of the basal ganglia: new anatomical and computational perspectives

Neural Networks - Computational models of neuromodulation
Pricing in Agent Economies Using Multi-Agent Q-Learning

Autonomous Agents and Multi-Agent Systems
The game of go: an ideal environment for capstone and undergraduate research projects

SIGCSE '03 Proceedings of the 34th SIGCSE technical symposium on Computer science education
Playing with AI

IEEE Intelligent Systems
Learning to play strong poker

Machines that learn to play games
Propagation of Q-values in Tabular TD(lambda)

ECML '02 Proceedings of the 13th European Conference on Machine Learning
Co-evolving a Neural-Net Evaluation Function for Othello by Combining Genetic Algorithms and Reinforcement Learning

ICCS '01 Proceedings of the International Conference on Computational Science-Part II
ADORE: Adaptive Object Recognition

ICVS '99 Proceedings of the First International Conference on Computer Vision Systems
An Overview of MAXQ Hierarchical Reinforcement Learning

SARA '02 Proceedings of the 4th International Symposium on Abstraction, Reformulation, and Approximation
Co-evolution, Determinism and Robustness

SEAL'98 Selected papers from the Second Asia-Pacific Conference on Simulated Evolution and Learning on Simulated Evolution and Learning
Reinforcement Learning: Past, Present and Future

SEAL'98 Selected papers from the Second Asia-Pacific Conference on Simulated Evolution and Learning on Simulated Evolution and Learning
Relational Reinforcement Learning

EASSS '01 Selected Tutorial Papers from the 9th ECCAI Advanced Course ACAI 2001 and Agent Link's 3rd European Agent Systems Summer School on Multi-Agent Systems and Applications
Machine Learning and Inductive Logic Programming for Multi-agent Systems

EASSS '01 Selected Tutorial Papers from the 9th ECCAI Advanced Course ACAI 2001 and Agent Link's 3rd European Agent Systems Summer School on Multi-Agent Systems and Applications
An Artificial Economy of Post Production Systems

IWLCS '00 Revised Papers from the Third International Workshop on Advances in Learning Classifier Systems
Learning to Behave by Environment Reinforcement

RoboCup-99: Robot Soccer World Cup III
Open Theoretical Questions in Reinforcement Learning

EuroCOLT '99 Proceedings of the 4th European Conference on Computational Learning Theory
Pricing in Agent Economies Using Neural Networks and Multi-agent Q-Learning

Sequence Learning - Paradigms, Algorithms, and Applications
From Simple Features to Sophisticated Evaluation Functions

CG '98 Proceedings of the First International Conference on Computers and Games
Chess Neighborhoods, Function Combination, and Reinforcement Learning

CG '00 Revised Papers from the Second International Conference on Computers and Games
Feedforward Neural Networks in Reinforcement Learning Applied to High-Dimensional Motor Control

ALT '02 Proceedings of the 13th International Conference on Algorithmic Learning Theory
Using reinforcement learning to introduce artificial intelligence in the CS curriculum

Journal of Computing Sciences in Colleges
Optimizing parameter learning using temporal differences

Eighteenth national conference on Artificial intelligence
Lyapunov design for safe reinforcement learning

The Journal of Machine Learning Research
A generic architecture for adaptive agents based on reinforcement learning

Information Sciences—Informatics and Computer Science: An International Journal - Special issue: Bio-inspired systems (BIS)
A multi-agent system integrating reinforcement learning, bidding and genetic algorithms

Web Intelligence and Agent Systems
Training a multi-layer feedforward neural network to play Othello using the backpropogation algorithm and reinforcement learning

Journal of Computing Sciences in Colleges
Fast multi-level adaptation for interactive autonomous characters

ACM Transactions on Graphics (TOG)
Machine learning

Encyclopedia of Computer Science
Designing intelligent sales-agent for online selling

ICEC '05 Proceedings of the 7th international conference on Electronic commerce
Cooperative Multi-Agent Learning: The State of the Art

Autonomous Agents and Multi-Agent Systems
Probabilistic neural network playing and learning Tic-Tac-Toe

Pattern Recognition Letters - Special issue: Artificial neural networks in pattern recognition
GP-Gammon: Genetically Programming Backgammon Players

Genetic Programming and Evolvable Machines
Evolutionary Body Building: Adaptive Physical Designs for Robots

Artificial Life
Discrete Optimization Problems - Some New Heuristic Approaches

HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Attention-Gated Reinforcement Learning of Internal Representations for Classification

Neural Computation
Evolution of Cooperative Problem Solving in an Artificial Economy

Neural Computation
A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning

Discrete Event Dynamic Systems
Learning to score final positions in the game of Go

Theoretical Computer Science - Advances in computer games
Tuning evaluation functions by maximizing concordance

Theoretical Computer Science - Advances in computer games
Machine learning and games

Machine Learning
An analytic solution to discrete Bayesian reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
On using multi-agent systems in playing board games

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Reinforcement learning for declarative optimization-based drama management

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Efficiently exploring architectural design spaces via predictive modeling

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Neural-based downlink scheduling algorithm for broadband wireless networks

Computer Communications
Using coevolution and gradient-based learning for the virus game

Proceedings of the 2006 international conference on Game research and development
Perspectives on multiagent learning

Artificial Intelligence
Collaborative Multiagent Reinforcement Learning by Payoff Propagation

The Journal of Machine Learning Research
Reinforcement Learning in Autonomic Computing: A Manifesto and Case Studies

IEEE Internet Computing
On the role of tracking in stationary environments

Proceedings of the 24th international conference on Machine learning
Learning to trade with insider information

Proceedings of the ninth international conference on Electronic commerce
On the use of hybrid reinforcement learning for autonomic resource allocation

Cluster Computing
The design and evaluation of an intelligent sales agent for online persuasion and negotiation

Electronic Commerce Research and Applications
Application of reinforcement learning to the game of Othello

Computers and Operations Research
Efficient architectural design space exploration via predictive modeling

ACM Transactions on Architecture and Code Optimization (TACO)
Model-based function approximation in reinforcement learning

Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
A globally optimal algorithm for TTD-MDPs

Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Finite-Time Bounds for Fitted Value Iteration

The Journal of Machine Learning Research
Neural Approximation of Monte Carlo Policy Evaluation Deployed in Connect Four

ANNPR '08 Proceedings of the 3rd IAPR workshop on Artificial Neural Networks in Pattern Recognition
Mixture of Expert Used to Learn Game Play

ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
Mimicking Go Experts with Convolutional Neural Networks

ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part II
Recognizing the Enemy: Combining Reinforcement Learning with Strategy Selection Using Case-Based Reasoning

ECCBR '08 Proceedings of the 9th European conference on Advances in Case-Based Reasoning
Player Co-Modelling in a Strategy Board Game: Discovering How to Play Fast

Cybernetics and Systems
Using temporal-difference learning for multi-agent bargaining

Electronic Commerce Research and Applications
A Draughts Learning System Based on Neural Networks and Temporal Differences: The Impact of an Efficient Tree-Search Algorithm

SBIA '08 Proceedings of the 19th Brazilian Symposium on Artificial Intelligence: Advances in Artificial Intelligence
An Experimental Approach to Online Opponent Modeling in Texas Hold'em Poker

SBIA '08 Proceedings of the 19th Brazilian Symposium on Artificial Intelligence: Advances in Artificial Intelligence
Tuning Local Search by Average-Reward Reinforcement Learning

Learning and Intelligent Optimization
An adaptive middleware for supporting time-critical event response

Cluster Computing
Graph memory development in a robot control architecture

Journal of Computing Sciences in Colleges
QL2, a simple reinforcement learning scheme for two-player zero-sum Markov games

Neurocomputing
Reinforcement distribution in fuzzy Q-learning

Fuzzy Sets and Systems
Evolving Computer Game Playing via Human-Computer Interaction: Machine Learning Tools in the Knowledge Engineering Life-Cycle

Proceedings of the 2008 conference on Knowledge-Based Software Engineering: Proceedings of the Eighth Joint Conference on Knowledge-Based Software Engineering
Using abstraction in Two-Player Games

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Neuroevolutionary reinforcement learning for generalized helicopter control

Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Machine learning in digital games: a survey

Artificial Intelligence Review
Neural Networks for State Evaluation in General Game Playing

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Efficient reinforcement learning with relocatable action models

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
M2ICAL analyses HC-gammon

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Visualization and adjustment of evaluation functions based on evaluation values and win probability

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Reinforcement using supervised learning for policy generalization

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Learning from multiple heuristics

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 1
RETALIATE: learning winning policies in first-person shooter games

IAAI'07 Proceedings of the 19th national conference on Innovative applications of artificial intelligence - Volume 2
Adaptive treatment of epilepsy via batch-mode reinforcement learning

IAAI'08 Proceedings of the 20th national conference on Innovative applications of artificial intelligence - Volume 3
Optimizing dialogue management with reinforcement learning: experiments with the NJFun system

Journal of Artificial Intelligence Research
Solving factored MDPs with hybrid state and action variables

Journal of Artificial Intelligence Research
Closed-loop learning of visual control policies

Journal of Artificial Intelligence Research
Statistical feature combination for the evaluation of game positions

Journal of Artificial Intelligence Research
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Online learning and exploiting relational models in reinforcement learning

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Learning policies for embodied virtual agents through demonstration

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Learning Minesweeper with multirelational learning

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Natural actor-critic algorithms

Automatica (Journal of IFAC)
Temporal difference learning applied to a high-performance game-playing program

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
Coevolving intelligent game players in a cultural framework

CEC'09 Proceedings of the Eleventh conference on Congress on Evolutionary Computation
Approximate dynamic programming using Bellman residual elimination and Gaussian process regression

ACC'09 Proceedings of the 2009 conference on American Control Conference
Goal-directed feature learning

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Application of reinforcement learning for agent-based production scheduling

Engineering Applications of Artificial Intelligence
Monte Carlo search applied to card selection in magic: the gathering

CIG'09 Proceedings of the 5th international conference on Computational Intelligence and Games
Evolution versus temporal difference learning for learning to play Ms. Pac-Man

CIG'09 Proceedings of the 5th international conference on Computational Intelligence and Games
Coevolutionary temporal difference learning for Othello

CIG'09 Proceedings of the 5th international conference on Computational Intelligence and Games
Introducing a round robin tournament into Blondie24

CIG'09 Proceedings of the 5th international conference on Computational Intelligence and Games
A task annotation model for sandbox Serious Games

CIG'09 Proceedings of the 5th international conference on Computational Intelligence and Games
MP-Draughts: a multiagent reinforcement learning system based on MLP and Kohonen-SOM neural networks

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Commentary---Perspectives on Stochastic Optimization Over Time

INFORMS Journal on Computing
Algorithms for memory hierarchies: advanced lectures

Algorithms for memory hierarchies: advanced lectures
A Convergent Online Single Time Scale Actor Critic Algorithm

The Journal of Machine Learning Research
A computational neural model of goal-directed utterance selection

Neural Networks
Feature construction for reinforcement learning in hearts

CG'06 Proceedings of the 5th international conference on Computers and games
An exploitative Monte-Carlo poker agent

KI'09 Proceedings of the 32nd annual German conference on Advances in artificial intelligence
Coordinated learning in multiagent MDPs with infinite state-space

Autonomous Agents and Multi-Agent Systems
ACE (Actor-Critic-Explorer) paradigm for reinforcement learning in basal ganglia: Highlighting the role of subthalamic and pallidal nuclei

Neurocomputing
Automatic induction of bellman-error features for probabilistic planning

Journal of Artificial Intelligence Research
Exploiting graph properties of game trees

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
BINAReE: (Bayesian Integrated Neural Architecture for Reasoning and Explanation)

Proceedings of the 2010 conference on Biologically Inspired Cognitive Architectures 2010: Proceedings of the First Annual Meeting of the BICA Society
Learning leg movement patterns using neural oscillators

Proceedings of the 48th Annual Southeast Regional Conference
Multivariate decision tree function approximation for reinforcement learning

ICONIP'10 Proceedings of the 17th international conference on Neural information processing: theory and algorithms - Volume Part I
Computer poker: A review

Artificial Intelligence
Learning n-tuple networks for othello by coevolutionary gradient search

Proceedings of the 13th annual conference on Genetic and evolutionary computation
Training neural networks to play backgammon variants using reinforcement learning

EvoApplications'11 Proceedings of the 2011 international conference on Applications of evolutionary computation - Volume Part I
Generalized TD Learning

The Journal of Machine Learning Research
Evolving equilibrium policies for a multiagent reinforcement learning problem with state attractors

ICCCI'11 Proceedings of the Third international conference on Computational collective intelligence: technologies and applications - Volume Part II
Artificial intelligence research at IBM

IBM Journal of Research and Development
Including cognitive biases and distance-based rewards in a connectionist model of complex problem solving

Neural Networks
Global versus local constructive function approximation for on-line reinforcement learning

AI'05 Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence
N-learning: a reinforcement learning paradigm for multiagent systems

AI'05 Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence
A hybrid learning strategy for discovery of policies of action

IBERAMIA-SBIA'06 Proceedings of the 2nd international joint conference, and Proceedings of the 10th Ibero-American Conference on AI 18th Brazilian conference on Advances in Artificial Intelligence
Reinforcement learning-based tuning algorithm applied to fuzzy identification

ISNN'06 Proceedings of the Third international conference on Advances in Neural Networks - Volume Part I
GP-Gammon: using genetic programming to evolve backgammon players

EuroGP'05 Proceedings of the 8th European conference on Genetic Programming
Feature-Discovering approximate value iteration methods

SARA'05 Proceedings of the 6th international conference on Abstraction, Reformulation and Approximation
Evolving small-board Go players using coevolutionary temporal difference learning with archives

International Journal of Applied Mathematics and Computer Science
On-Line reinforcement learning using cascade constructive neural networks

KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part III
Fuzzeval: a fuzzy controller-based approach in adaptive learning for backgammon game

MICAI'05 Proceedings of the 4th Mexican international conference on Advances in Artificial Intelligence
Time does not always buy quality in co-evolutionary learning

SETN'10 Proceedings of the 6th Hellenic conference on Artificial Intelligence: theories, models and applications
Reducing the memory footprint of temporal difference learning over finitely many states by using case-based generalization

ICCBR'10 Proceedings of the 18th international conference on Case-Based Reasoning Research and Development
Efficient control of selective simulations

CG'04 Proceedings of the 4th international conference on Computers and Games
Rediscovering *-MINIMAX search

CG'04 Proceedings of the 4th international conference on Computers and Games
*-MINIMAX performance in backgammon

CG'04 Proceedings of the 4th international conference on Computers and Games
Teaching an undergraduate AI course with games and simulation

Edutainment'06 Proceedings of the First international conference on Technologies for E-Learning and Digital Entertainment
Reinforcement learning for rule extraction from a labeled dataset

Cognitive Systems Research
Integrating a partial model into model free reinforcement learning

The Journal of Machine Learning Research
Analysis of solutions to the time-optimal planning and execution problem

Intelligent Service Robotics
Generating artificial neural networks for value function approximation in a domain requiring a shifting strategy

EvoApplications'13 Proceedings of the 16th European conference on Applications of Evolutionary Computation
An efficient L2-norm regularized least-squares temporal difference learning algorithm

Knowledge-Based Systems
Design with shape grammars and reinforcement learning

Advanced Engineering Informatics
Simulation, learning, and optimization techniques in Watson's game strategies

IBM Journal of Research and Development
Testing probabilistic equivalence through Reinforcement Learning

Information and Computation
Baseline: practical control variates for agent evaluation in zero-sum domains

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Machine learning for interactive systems and robots: a brief introduction

Proceedings of the 2nd Workshop on Machine Learning for Interactive Systems: Bridging the Gap Between Perception, Action and Communication
Monte-Carlo tree search for Bayesian reinforcement learning

Applied Intelligence
Learning via human feedback in continuous state and action spaces

Applied Intelligence
Analysis of watson's strategies for playing Jeopardy!

Journal of Artificial Intelligence Research
Survey Control: A perspective

Automatica (Journal of IFAC)
Ubiquitous command and control

Intelligent Decision Technologies

Quantified Score

Hi-index	48.22

Visualization

Abstract

Ever since the days of Shannon's proposal for a chess-playing algorithm [12] and Samuel's checkers-learning program [10] the domain of complex board games such as Go, chess, checkers, Othello, and backgammon has been widely regarded as an ideal testing ground for exploring a variety of concepts and approaches in artificial intelligence and machine learning. Such board games offer the challenge of tremendous complexity and sophistication required to play at expert level. At the same time, the problem inputs and performance measures are clear-cut and well defined, and the game environment is readily automated in that it is easy to simulate the board, the rules of legal play, and the rules regarding when the game is over and determining the outcome.