Monte-Carlo simulation balancing in practice

Authors:
Shih-Chieh Huang;Rémi Coulom;Shun-Shii Lin
Affiliations:
National Taiwan Normal University, Dept. of CSIE, Taiwan, R.O.C;Université de Lille, CNRS, INRIA, France;National Taiwan Normal University, Dept. of CSIE, Taiwan, R.O.C
Venue:
CG'10 Proceedings of the 7th international conference on Computers and games
Year:
2010

Citing 8
Cited 3

Expected-Outcome: A General Model of Static Evaluation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Local Gain Adaptation in Stochastic Gradient Descent

Local Gain Adaptation in Stochastic Gradient Descent
Combining online and offline knowledge in UCT

Proceedings of the 24th international conference on Machine learning
Monte-Carlo simulation balancing

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Associating domain-dependent knowledge and Monte Carlo approaches within a Go program

Information Sciences: an International Journal
Efficient selectivity and backup operators in Monte-Carlo tree search

CG'06 Proceedings of the 5th international conference on Computers and games
Bandit based monte-carlo planning

ECML'06 Proceedings of the 17th European conference on Machine Learning
Adding expert knowledge and exploration in monte-carlo tree search

ACG'09 Proceedings of the 12th international conference on Advances in Computer Games

Monte-Carlo tree search and rapid action value estimation in computer Go

Artificial Intelligence
Parallel Monte-Carlo tree search for HPC systems

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
The grand challenge of computer Go: Monte Carlo tree search and extensions

Communications of the ACM

Quantified Score

Hi-index	0.02

Visualization

Abstract

Simulation balancing is a new technique to tune parameters of a playout policy for a Monte-Carlo game-playing program. So far, this algorithm had only been tested in a very artificial setting: it was limited to 5×5 and 6×6 Go, and required a stronger external program that served as a supervisor. In this paper, the effectiveness of simulation balancing is demonstrated in a more realistic setting. A state-of-the-art program, Erica, learned an improved playout policy on the 9×9 board, without requiring any external expert to provide position evaluations. The evaluations were collected by letting the program analyze positions by itself. The previous version of Erica learned pattern weights with the minorization-maximization algorithm. Thanks to simulation balancing, its playing strength was improved from a winning rate of 69% to 78% against Fuego 0.4.