TD-Gammon, a self-teaching backgammon program, achieves master-level play
Neural Computation
Artificial Intelligence - Chips challenging champions: games, computers and Artificial Intelligence
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Finite-time Analysis of the Multiarmed Bandit Problem
Machine Learning
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
Combining online and offline knowledge in UCT
Proceedings of the 24th international conference on Machine learning
One Jump Ahead: Computer Perfection at Checkers
One Jump Ahead: Computer Perfection at Checkers
Simulation-based approach to general game playing
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 1
Efficient selectivity and backup operators in Monte-Carlo tree search
CG'06 Proceedings of the 5th international conference on Computers and games
Bandit based monte-carlo planning
ECML'06 Proceedings of the 17th European conference on Machine Learning
Hi-index | 0.00 |
Monte-Carlo Tree Search (MCTS) has proved a remarkably effective decision mechanism in many different game domains, including computer Go and general game playing (GGP). However, in GGP, where many disparate games are played, certain type of games have proved to be particularly problematic for MCTS. One of the problems are game trees with so-called optimistic moves, that is, bad moves that superficially look good but potentially require much simulation effort to prove otherwise. Such scenarios can be difficult to identify in real time and can lead to suboptimal or even harmful decisions. In this paper we investigate a selection strategy for MCTS to alleviate this problem. The strategy, called sufficiency threshold, concentrates simulation effort better for resolving potential optimistic move scenarios. The improved strategy is evaluated empirically in an n-arm-bandit test domain for highlighting its properties as well as in a state-of-the-art GGP agent to demonstrate its effectiveness in practice. The new strategy shows significant improvements in both domains.