Efficient selectivity and backup operators in Monte-Carlo tree search

  • Authors:
  • Rémi Coulom

  • Affiliations:
  • CNRS-LIFL, INRIA-SequeL, Université Charles de Gaulle, Lille, France

  • Venue:
  • CG'06 Proceedings of the 5th international conference on Computers and games
  • Year:
  • 2006

Quantified Score

Hi-index 0.02

Visualization

Abstract

A Monte-Carlo evaluation consists in estimating a position by averaging the outcome of several random continuations. The method can serve as an evaluation function at the leaves of a min-max tree. This paper presents a new framework to combine tree search with Monte-Carlo evaluation, that does not separate between a min-max phase and a Monte-Carlo phase. Instead of backing-up the min-max value close to the root, and the average value at some depth, a more general backup operator is defined that progressively changes from averaging to minmax as the number of simulations grows. This approach provides a finegrained control of the tree growth, at the level of individual simulations, and allows efficient selectivity. The resulting algorithm was implemented in a 9 × 9 Go-playing program, Crazy Stone, that won the 10th KGS computer-Go tournament.