First Results from Using Temporal Difference Learning in Shogi

Authors:
Donald F. Beal;Martin C. Smith
Affiliations:
-;-
Venue:
CG '98 Proceedings of the First International Conference on Computers and Games
Year:
1998

Citing 3
Cited 2

TD-Gammon, a self-teaching backgammon program, achieves master-level play

Neural Computation
Learning to Predict by the Methods of Temporal Differences

Machine Learning
KnightCap: A Chess Programm That Learns by Combining TD(lambda) with Game-Tree Search

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning

Game playing (invited talk): the next moves

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Chess Neighborhoods, Function Combination, and Reinforcement Learning

CG '00 Revised Papers from the Second International Conference on Computers and Games

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes first results from the application of Temporal Difference learning [1] to shogi. We report on experiments to determine whether sensible values for shogi pieces can be obtained in the same manner as for western chess pieces [2]. The learning is obtained entirely from randomised self-play, without access to any form of expert knowledge. The piece values are used in a simple search program that chooses shogi moves from a shallow lookahead, using pieces values to evaluate the leaves, with a random tie-break at the top level. Temporal difference learning is used to adjust the piece values over the course of a series of games. The method is successful in learning values that perform well in matches against hand-crafted values.