Monte-Carlo simulation balancing

Authors:
David Silver;Gerald Tesauro
Affiliations:
University of Alberta, Edmonton, AB;IBM Watson Research Center, Hawthorne, NY
Venue:
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Year:
2009

Citing 7
Cited 7

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Machine Learning
Using probabilistic knowledge and simulation to play poker

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
World-championship-caliber Scrabble

Artificial Intelligence - Chips challenging champions: games, computers and Artificial Intelligence
Combining online and offline knowledge in UCT

Proceedings of the 24th international conference on Machine learning
Simulation-based approach to general game playing

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 1
Reinforcement learning of local shape in the game of go

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Bandit based monte-carlo planning

ECML'06 Proceedings of the 17th European conference on Machine Learning

Monte-Carlo simulation balancing in practice

CG'10 Proceedings of the 7th international conference on Computers and games
Monte-Carlo tree search and rapid action value estimation in computer Go

Artificial Intelligence
A Monte-Carlo AIXI approximation

Journal of Artificial Intelligence Research
Parallel Monte-Carlo tree search for HPC systems

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Evolutionary learning of policies for MCTS simulations

Proceedings of the International Conference on the Foundations of Digital Games
Nested rollout policy adaptation for Monte Carlo tree search

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One
Design and parametric considerations for artificial neural network pruning in UCT game playing

Proceedings of the South African Institute for Computer Scientists and Information Technologists Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we introduce the first algorithms for efficiently learning a simulation policy for Monte-Carlo search. Our main idea is to optimise the balance of a simulation policy, so that an accurate spread of simulation outcomes is maintained, rather than optimising the direct strength of the simulation policy. We develop two algorithms for balancing a simulation policy by gradient descent. The first algorithm optimises the balance of complete simulations, using a policy gradient algorithm; whereas the second algorithm optimises the balance over every two steps of simulation. We compare our algorithms to reinforcement learning and supervised learning algorithms for maximising the strength of the simulation policy. We test each algorithm in the domain of 5 x 5 and 6 x 6 Computer Go, using a softmax policy that is parameterised by weights for a hundred simple patterns. When used in a simple Monte-Carlo search, the policies learnt by simulation balancing achieved significantly better performance, with half the mean squared error of a uniform random policy, and similar overall performance to a sophisticated Go engine.