Nested rollout policy adaptation for Monte Carlo tree search

Authors:
Christopher D. Rosin
Affiliations:
Parity Computing, Inc., San Diego, California
Venue:
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One
Year:
2011

Citing 12
Cited 2

Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation

Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation
Morpion Solitaire

Theory of Computing Systems
Combining online and offline knowledge in UCT

Proceedings of the 24th international conference on Machine learning
Bandit-based optimization on graphs with application to library performance tuning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Monte-Carlo simulation balancing

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Nested Monte-Carlo search

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Monte-Carlo exploration for deterministic planning

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Search lessons learned from crossword puzzles

AAAI'90 Proceedings of the eighth National conference on Artificial intelligence - Volume 1
UCD: Upper Confidence Bound for Rooted Directed Acyclic Graphs

TAAI '10 Proceedings of the 2010 International Conference on Technologies and Applications of Artificial Intelligence
Nested Monte-Carlo Search with AMAF Heuristic

TAAI '10 Proceedings of the 2010 International Conference on Technologies and Applications of Artificial Intelligence
Optimization of the nested Monte-Carlo algorithm on the traveling salesman problem with time windows

EvoApplications'11 Proceedings of the 2011 international conference on Applications of evolutionary computation - Volume Part II
A Monte-Carlo AIXI approximation

Journal of Artificial Intelligence Research

Application of the nested rollout policy adaptation algorithm to the traveling salesman problem with time windows

LION'12 Proceedings of the 6th international conference on Learning and Intelligent Optimization
Investigating monte-carlo methods on the weak schur problem

EvoCOP'13 Proceedings of the 13th European conference on Evolutionary Computation in Combinatorial Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

Monte Carlo tree search (MCTS) methods have had recent success in games, planning, and optimization. MCTS uses results from rollouts to guide search; a rollout is a path that descends the tree with a randomized decision at each ply until reaching a leaf. MCTS results can be strongly influenced by the choice of appropriate policy to bias the rollouts. Most previous work on MCTS uses static uniform random or domain-specific policies. We describe a new MCTS method that dynamically adapts the rollout policy during search, in deterministic optimization problems. Our starting point is Cazenave's original Nested Monte Carlo Search (NMCS), but rather than navigating the tree directly we instead use gradient ascent on the rollout policy at each level of the nested search. We benchmark this new Nested Rollout Policy Adaptation (NRPA) algorithm and examine its behavior. Our test problems are instances of Crossword Puzzle Construction and Morpion Solitaire. Over moderate time scales NRPA can substantially improve search efficiency compared to NMCS, and over longer time scales NRPA improves upon all previous published solutions for the test problems. Results include a new Morpion Solitaire solution that improves upon the previous human-generated record that had stood for over 30 years.