Sufficiency-based selection strategy for MCTS

Authors:
Stefan Freyr Gudmundsson;Yngvi Björnsson
Affiliations:
School of Computer Science, Reykjavik University, Iceland;School of Computer Science, Reykjavik University, Iceland
Venue:
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Year:
2013

Citing 10
Cited 0

TD-Gammon, a self-teaching backgammon program, achieves master-level play

Neural Computation
Deep Blue

Artificial Intelligence - Chips challenging champions: games, computers and Artificial Intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Finite-time Analysis of the Multiarmed Bandit Problem

Machine Learning
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Combining online and offline knowledge in UCT

Proceedings of the 24th international conference on Machine learning
One Jump Ahead: Computer Perfection at Checkers

One Jump Ahead: Computer Perfection at Checkers
Simulation-based approach to general game playing

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 1
Efficient selectivity and backup operators in Monte-Carlo tree search

CG'06 Proceedings of the 5th international conference on Computers and games
Bandit based monte-carlo planning

ECML'06 Proceedings of the 17th European conference on Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Monte-Carlo Tree Search (MCTS) has proved a remarkably effective decision mechanism in many different game domains, including computer Go and general game playing (GGP). However, in GGP, where many disparate games are played, certain type of games have proved to be particularly problematic for MCTS. One of the problems are game trees with so-called optimistic moves, that is, bad moves that superficially look good but potentially require much simulation effort to prove otherwise. Such scenarios can be difficult to identify in real time and can lead to suboptimal or even harmful decisions. In this paper we investigate a selection strategy for MCTS to alleviate this problem. The strategy, called sufficiency threshold, concentrates simulation effort better for resolving potential optimistic move scenarios. The improved strategy is evaluated empirically in an n-arm-bandit test domain for highlighting its properties as well as in a state-of-the-art GGP agent to demonstrate its effectiveness in practice. The new strategy shows significant improvements in both domains.