Using probabilistic knowledge and simulation to play poker
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
World-championship-caliber Scrabble
Artificial Intelligence - Chips challenging champions: games, computers and Artificial Intelligence
Combining online and offline knowledge in UCT
Proceedings of the 24th international conference on Machine learning
Simulation-based approach to general game playing
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 1
Reinforcement learning of local shape in the game of go
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Bandit based monte-carlo planning
ECML'06 Proceedings of the 17th European conference on Machine Learning
Monte-Carlo simulation balancing in practice
CG'10 Proceedings of the 7th international conference on Computers and games
Monte-Carlo tree search and rapid action value estimation in computer Go
Artificial Intelligence
A Monte-Carlo AIXI approximation
Journal of Artificial Intelligence Research
Parallel Monte-Carlo tree search for HPC systems
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Evolutionary learning of policies for MCTS simulations
Proceedings of the International Conference on the Foundations of Digital Games
Nested rollout policy adaptation for Monte Carlo tree search
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One
Design and parametric considerations for artificial neural network pruning in UCT game playing
Proceedings of the South African Institute for Computer Scientists and Information Technologists Conference
Hi-index | 0.00 |
In this paper we introduce the first algorithms for efficiently learning a simulation policy for Monte-Carlo search. Our main idea is to optimise the balance of a simulation policy, so that an accurate spread of simulation outcomes is maintained, rather than optimising the direct strength of the simulation policy. We develop two algorithms for balancing a simulation policy by gradient descent. The first algorithm optimises the balance of complete simulations, using a policy gradient algorithm; whereas the second algorithm optimises the balance over every two steps of simulation. We compare our algorithms to reinforcement learning and supervised learning algorithms for maximising the strength of the simulation policy. We test each algorithm in the domain of 5 x 5 and 6 x 6 Computer Go, using a softmax policy that is parameterised by weights for a hundred simple patterns. When used in a simple Monte-Carlo search, the policies learnt by simulation balancing achieved significantly better performance, with half the mean squared error of a uniform random policy, and similar overall performance to a sophisticated Go engine.