Comparison training of chess evaluation functions

Authors:
Gerald Tesauro
Affiliations:
IBM T.J. Watson Research Center, P.O. Box 704, Yorktown Heights, NY
Venue:
Machines that learn to play games
Year:
2001

Citing 0
Cited 3

Deep Blue

Artificial Intelligence - Chips challenging champions: games, computers and Artificial Intelligence
Tuning evaluation functions by maximizing concordance

Theoretical Computer Science - Advances in computer games
Learning with lookahead: can history-based models rival globally optimized models?

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

The supervised learning methodology of "comparison training" (Tesauro 1989a) on a database of expert preferences is extended to search depths beyond 1-ply, and applied to the problem of training the weights in a linear evaluation function for the game of chess. An initial set of experiments was performed using SCP, a public-domain chess program. Training based on simple 1-ply searches was found to be ineffective, but for 1-ply plus quiescence expansion, high-quality solutions were found that outperform SCP's hand-tuned weights. The trained weights had performance that scaled well with search depth, and consistent improvement over the hand-tuned solution was found even for test depths much greater than the training search depth.A discretized version of the algorithm was also developed and used to tune a subset of the weights in DEEP BLUE, having to do primarily with king safety evaluation. Training was based on 4-ply search (plus quiescence), and good test-set generalization was found out to 7-ply. During the 1997 rematch with Garry Kasparov, the tuning of the king-safety weights made a critical difference in one important position in game 2, and in the program's general understanding and handling of game 6.