Adding expert knowledge and exploration in monte-carlo tree search

Authors:
Guillaume Chaslot;Christophe Fiter;Jean-Baptiste Hoock;Arpad Rimmel;Olivier Teytaud
Affiliations:
Games and AI Group, MICC, Faculty of Humanities and Sciences, Universiteit Maastricht, Maastricht, The Netherlands;TAO (Inria), LRI, UMR 8623 (CNRS - Univ. Paris-Sud), Orsay, France;TAO (Inria), LRI, UMR 8623 (CNRS - Univ. Paris-Sud), Orsay, France;TAO (Inria), LRI, UMR 8623 (CNRS - Univ. Paris-Sud), Orsay, France;TAO (Inria), LRI, UMR 8623 (CNRS - Univ. Paris-Sud), Orsay, France
Venue:
ACG'09 Proceedings of the 12th international conference on Advances in Computer Games
Year:
2009

Citing 4
Cited 7

Combining online and offline knowledge in UCT

Proceedings of the 24th international conference on Machine learning
Associating domain-dependent knowledge and Monte Carlo approaches within a Go program

Information Sciences: an International Journal
Efficient selectivity and backup operators in Monte-Carlo tree search

CG'06 Proceedings of the 5th international conference on Computers and games
Bandit based monte-carlo planning

ECML'06 Proceedings of the 17th European conference on Machine Learning

Intelligent agents for the game of go

IEEE Computational Intelligence Magazine
Biasing Monte-Carlo simulations through RAVE values

CG'10 Proceedings of the 7th international conference on Computers and games
Monte-Carlo simulation balancing in practice

CG'10 Proceedings of the 7th international conference on Computers and games
Multi-armed bandits with episode context

Annals of Mathematics and Artificial Intelligence
Evolutionary learning of policies for MCTS simulations

Proceedings of the International Conference on the Foundations of Digital Games
Dynamic randomization and domain knowledge in Monte-Carlo Tree Search for Go knowledge-based systems

Knowledge-Based Systems
Bootstrapping monte carlo tree search with an imperfect heuristic

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a new exploration term, more efficient than classical UCT-like exploration terms. It combines efficiently expert rules, patterns extracted from datasets, All-Moves-As-First values, and classical online values. As this improved bandit formula does not solve several important situations (semeais, nakade) in computer Go, we present three other important improvements which are central in the recent progress of our program MoGo. We show an expert-based improvement of Monte-Carlo simulations for nakade situations; we also emphasize some limitations of this modification. We show a technique which preserves diversity in the Monte-Carlo simulation, which greatly improves the results in 19x19. Whereas the UCB-based exploration term is not efficient in MoGo, we show a new exploration term which is highly efficient in MoGo. MoGo recently won a game with handicap 7 against a 9Dan Pro player, Zhou JunXun, winner of the LG Cup 2007, and a game with handicap 6 against a 1Dan pro player, Li-Chen Chien.