Programming backgammon using self-teaching neural nets

Authors:
Gerald Tesauro
Affiliations:
IBM Thomas J. Watson Research Center, Hawthorne, NY
Venue:
Artificial Intelligence - Chips challenging champions: games, computers and Artificial Intelligence
Year:
2002

Citing 11
Cited 18

Multilayer feedforward networks are universal approximators

Neural Networks
Learning internal representations by error propagation

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
Practical Issues in Temporal Difference Learning

Machine Learning
Temporal difference learning and TD-Gammon

Communications of the ACM
Co-Evolution in the Successful Learning of Backgammon Strategy

Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Value Function Based Production Scheduling

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
KnightCap: A Chess Programm That Learns by Combining TD(lambda) with Game-Tree Search

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Optimizing Production Manufacturing Using Reinforcement Learning

Proceedings of the Eleventh International Florida Artificial Intelligence Research Society Conference
Call admission control and routing in integrated services networks using neuro-dynamic programming

IEEE Journal on Selected Areas in Communications

The challenge of poker

Artificial Intelligence - Chips challenging champions: games, computers and Artificial Intelligence
Strategic sequential bidding in auctions using dynamic programming

Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 2
Application of reinforcement learning to the game of Othello

Computers and Operations Research
TOWARDS OPTIMIZING ENTERTAINMENT IN COMPUTER GAMES

Applied Artificial Intelligence
Player Co-Modelling in a Strategy Board Game: Discovering How to Play Fast

Cybernetics and Systems
Extending the Strada Framework to Design an AI for ORTS

ICEC '09 Proceedings of the 8th International Conference on Entertainment Computing
Introducing a round robin tournament into Blondie24

CIG'09 Proceedings of the 5th international conference on Computational Intelligence and Games
No-limit texas hold'em poker agents created with evolutionary neural networks

CIG'09 Proceedings of the 5th international conference on Computational Intelligence and Games
Efficient selectivity and backup operators in Monte-Carlo tree search

CG'06 Proceedings of the 5th international conference on Computers and games
A Markovian process modeling for Pickomino

CG'10 Proceedings of the 7th international conference on Computers and games
A new software application for backgammon based on a heuristic algorithm

ECC'11 Proceedings of the 5th European conference on European computing conference
Training neural networks to play backgammon variants using reinforcement learning

EvoApplications'11 Proceedings of the 2011 international conference on Applications of evolutionary computation - Volume Part I
ARKAQ-learning: autonomous state space segmentation and policy generation

ISCIS'05 Proceedings of the 20th international conference on Computer and Information Sciences
*-MINIMAX performance in backgammon

CG'04 Proceedings of the 4th international conference on Computers and Games
Immune based fuzzy agent plays checkers game

Applied Soft Computing
An automated signalized junction controller that learns strategies by temporal difference reinforcement learning

Engineering Applications of Artificial Intelligence
Baseline: practical control variates for agent evaluation in zero-sum domains

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
A tour of machine learning: An AI perspective

AI Communications - ECAI 2012 Turing and Anniversary Track

Quantified Score

Hi-index	0.00

Visualization

Abstract

TD-Gammon is a neural network that is able to teach itself to play backgammon solely by playing against itself and learning from the results. Starting from random initial play, TD-Gammon's self-teaching methodology results in a surprisingly strong program: without lookahead, its positional judgement rivals that of human experts, and when combined with shallow lookahead, it reaches a level of play that surpasses even the best human players. The success of TD-Gammon has also been replicated by several other programmers; at least two other neural net programs also appear to be capable of superhuman play. Previous papers on TD-Gammon have focused on developing a scientific understanding of its reinforcement learning methodology. This paper views machine learning as a tool in a programmer's toolkit, and considers how it can be combined with other programming techniques to achieve and surpass world-class backgammon play. Particular emphasis is placed on programming shallow-depth search algorithms, and on TD-Gammon's doubling algorithm, which is described in print here for the first time. Copyright 2002 Elsevier Science B.V.