Evolutionary learning of policies for MCTS simulations

  • Authors:
  • James Pettit;David Helmbold

  • Affiliations:
  • U.C. Santa Cruz, Santa Cruz, California;U.C. Santa Cruz, Santa Cruz, California

  • Venue:
  • Proceedings of the International Conference on the Foundations of Digital Games
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Monte-Carlo Tree Search (MCTS) grows a partial game tree and uses a large number of random simulations to approximate the values of the nodes. It has proven effective in games with such as Go and Hex where the large search space and difficulty of evaluating positions cause difficulties for standard methods. The best MCTS players use carefully hand-crafted rules to bias the random simulations. Obtaining good hand-crafting rules is a very difficult process, as even rules promoting better simulation play can result in a weaker MCTS system [12]. Our Hivemind system uses evolution strategies to automatically learn effective rules for biasing the random simulations. We have built a MCTS player using Hivemind for the game Hex. The Hivemind learned rules result in a 90% win rate against a baseline MCTS system, and significant improvement against the computer Hex world champion, MoHex.