First Results from Using Temporal Difference Learning in Shogi

  • Authors:
  • Donald F. Beal;Martin C. Smith

  • Affiliations:
  • -;-

  • Venue:
  • CG '98 Proceedings of the First International Conference on Computers and Games
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes first results from the application of Temporal Difference learning [1] to shogi. We report on experiments to determine whether sensible values for shogi pieces can be obtained in the same manner as for western chess pieces [2]. The learning is obtained entirely from randomised self-play, without access to any form of expert knowledge. The piece values are used in a simple search program that chooses shogi moves from a shallow lookahead, using pieces values to evaluate the leaves, with a random tie-break at the top level. Temporal difference learning is used to adjust the piece values over the course of a series of games. The method is successful in learning values that perform well in matches against hand-crafted values.