Temporal difference learning applied to a high-performance game-playing program

Authors:
Jonathan Schaeffer;Markian Hlynka;Vili Jussila
Affiliations:
Department of Computing Science, University of Alberta, Edmonton, Canada;Department of Computing Science, University of Alberta, Edmonton, Canada;Laboratory of Computational Engineering, Helsinki University of Technology, Helsinki, Finland
Venue:
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
Year:
2001

Citing 9
Cited 17

Measuring the performance potential of chess programs

Artificial Intelligence - Special issue on computer chess
Automatic feature generation for problem solving systems

ML92 Proceedings of the ninth international workshop on Machine learning
Temporal difference learning and TD-Gammon

Communications of the ACM
One jump ahead: challenging human supremacy in checkers

One jump ahead: challenging human supremacy in checkers
Learning to Play Chess Using Temporal Differences

Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
KnightCap: A Chess Programm That Learns by Combining TD(lambda) with Game-Tree Search

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A statistical study of selective min-max search in computer chess

A statistical study of selective min-max search in computer chess
Statistical feature combination for the evaluation of game positions

Journal of Artificial Intelligence Research

Optimizing parameter learning using temporal differences

Eighteenth national conference on Artificial intelligence
Learning extension parameters in game-tree search

Information Sciences—Informatics and Computer Science: An International Journal - Special issue: Heuristic search and computer game playing III
Tuning evaluation functions by maximizing concordance

Theoretical Computer Science - Advances in computer games
Combining online and offline knowledge in UCT

Proceedings of the 24th international conference on Machine learning
Genetic algorithms for mentor-assisted evaluation function optimization

Proceedings of the 10th annual conference on Genetic and evolutionary computation
Sample-based learning and search with permanent and transient memories

Proceedings of the 25th international conference on Machine learning
Evolutionary Approach to the Game of Checkers

ICANNGA '07 Proceedings of the 8th international conference on Adaptive and Natural Computing Algorithms, Part I
A Draughts Learning System Based on Neural Networks and Temporal Differences: The Impact of an Efficient Tree-Search Algorithm

SBIA '08 Proceedings of the 19th Brazilian Symposium on Artificial Intelligence: Advances in Artificial Intelligence
Simulating human grandmasters: evolution and coevolution of evaluation functions

Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Achieving master level play in 9×9 computer go

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
MP-Draughts: a multiagent reinforcement learning system based on MLP and Kohonen-SOM neural networks

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
The layered learning method and its application to generation of evaluation functions for the game of checkers

PPSN'10 Proceedings of the 11th international conference on Parallel problem solving from nature: Part II
Expert-driven genetic algorithms for simulating evaluation functions

Genetic Programming and Evolvable Machines
Training neural networks to play backgammon variants using reinforcement learning

EvoApplications'11 Proceedings of the 2011 international conference on Applications of evolutionary computation - Volume Part I
A methodology for learning players| styles from game records

International Journal of Artificial Intelligence and Soft Computing
Automatic generation of search engines

ACG'05 Proceedings of the 11th international conference on Advances in Computer Games
Design with shape grammars and reinforcement learning

Advanced Engineering Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

The temporal difference (TD) learning algorithm offers the hope that the arduous task of manually tuning the evaluation function weights of game-playing programs can be automated. With one exception (TD-Gammon), TD learning has not been demonstrated to be effictive in a high-performance, world Class game-palying program. Further, there has been doubt expressed by game-program developers that learned weights could compete with the best hand-tuned weights. Chinook is the World Man-Machine tuned over 5 years. This paper shows that TD learinng is capable of competing with the best human effort.