Learning to Play Chess Using Temporal Differences

Authors:
Jonathan Baxter;Andrew Tridgell;Lex Weaver
Affiliations:
Department of Systems Engineering, Australian National University 0200, Australia. jonathan.baxter@anu.edu.au;Department of Computer Science, Australian National University 0200, Australia. andrew.tridgell@cs.anu.edu.au;Department of Computer Science, Australian National University 0200, Australia. lex.weaver@cs.anu.edu.au
Venue:
Machine Learning
Year:
2000

Citing 8
Cited 24

The History Heuristic and Alpha-Beta Search Enhancements in Practice

IEEE Transactions on Pattern Analysis and Machine Intelligence
Practical Issues in Temporal Difference Learning

Machine Learning
TD-Gammon, a self-teaching backgammon program, achieves master-level play

Neural Computation
Best-first fixed-depth minimax algorithms

Artificial Intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Computers, Chess, and Cognition

Computers, Chess, and Cognition
Learning to Predict by the Methods of Temporal Differences

Machine Learning

Learning to play strong poker

Machines that learn to play games
Optimizing parameter learning using temporal differences

Eighteenth national conference on Artificial intelligence
Learning extension parameters in game-tree search

Information Sciences—Informatics and Computer Science: An International Journal - Special issue: Heuristic search and computer game playing III
Learning long-term chess strategies from databases

Machine Learning
Universal parameter optimisation in games based on SPSA

Machine Learning
Genetic algorithms for mentor-assisted evaluation function optimization

Proceedings of the 10th annual conference on Genetic and evolutionary computation
Using abstraction in Two-Player Games

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Simulating human grandmasters: evolution and coevolution of evaluation functions

Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Visualization and adjustment of evaluation functions based on evaluation values and win probability

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Infinite-horizon policy-gradient estimation

Journal of Artificial Intelligence Research
Experiments with infinite-horizon, policy-gradient estimation

Journal of Artificial Intelligence Research
An Intelligent Agent That Autonomously Learns How to Translate

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 02
Temporal difference learning applied to a high-performance game-playing program

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
Evolution and incremental learning in the iterated prisoner's dilemma

IEEE Transactions on Evolutionary Computation
The layered learning method and its application to generation of evaluation functions for the game of checkers

PPSN'10 Proceedings of the 11th international conference on Parallel problem solving from nature: Part II
Expert-driven genetic algorithms for simulating evaluation functions

Genetic Programming and Evolvable Machines
A methodology for learning players| styles from game records

International Journal of Artificial Intelligence and Soft Computing
Self-teaching adaptive dynamic programming for Gomoku

Neurocomputing
Automatic construction of static evaluation functions for computer game players

DS'06 Proceedings of the 9th international conference on Discovery Science
N-learning: a reinforcement learning paradigm for multiagent systems

AI'05 Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence
Design with shape grammars and reinforcement learning

Advanced Engineering Informatics
Monte-Carlo tree search for Bayesian reinforcement learning

Applied Intelligence
Learning via human feedback in continuous state and action spaces

Applied Intelligence
An intelligent Web agent that autonomously learns how to translate

Web Intelligence and Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present TDLEAF(λ), a variation on the TD(λ) algorithm that enables it to be used in conjunction with game-tree search. We present some experiments in which our chess program “KnightCap” used TDLEAF(λ) to learn its evaluation function while playing on Internet chess servers. The main success we report is that KnightCap improved from a 1650 rating to a 2150 rating in just 308 games and 3 days of play. As a reference, a rating of 1650 corresponds to about level B human play (on a scale from E (1000) to A (1800)), while 2150 is human master level. We discuss some of the reasons for this success, principle among them being the use of on-line, rather than self-play. We also investigate whether TDLEAF(λ) can yield better results in the domain of backgammon, where TD(λ) has previously yielded striking success.