Backpropagation modification in Monte-Carlo game tree search

Authors:
Fan Xie;Zhiqing Liu
Affiliations:
Jiu-Ding Computer Go Research Institute, Beijing University of Post and Telecommunication, Beijing, China;Jiu-Ding Computer Go Research Institute, Beijing University of Post and Telecommunication, Beijing, China
Venue:
IITA'09 Proceedings of the 3rd international conference on Intelligent information technology application
Year:
2009

Citing 7
Cited 0

Computer Go

Artificial Intelligence - Chips challenging champions: games, computers and Artificial Intelligence
The challenge of poker

Artificial Intelligence - Chips challenging champions: games, computers and Artificial Intelligence
World-championship-caliber Scrabble

Artificial Intelligence - Chips challenging champions: games, computers and Artificial Intelligence
Finite-time Analysis of the Multiarmed Bandit Problem

Machine Learning
Combining online and offline knowledge in UCT

Proceedings of the 24th international conference on Machine learning
A sparse sampling algorithm for near-optimal planning in large Markov decision processes

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Bandit based monte-carlo planning

ECML'06 Proceedings of the 17th European conference on Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Algorithm UCT, proposed by Kocsys et al. [3], which apply multi-armed bandit problem into the tree-structured search space, achieves some remarkable success in some challenging fields [2]. For UCT algorithm, Monte-Carlo simulations are performed with the guidance of UCB1 formula, which are averaged to evaluate a specified action. We observe that, as more simulations are performed, later ones usually lead to more accurate results, partly because the level of the search used in the later simulation is deeper and partly because more results are available to direct subsequent simulations. This paper presents a new method to improve the performance of UCT algorithm by increasing the feedback value of the later simulations. And the experimental results in the classical game Go show that our approach increases the performance of Monte-Carlo simulations significantly when exponential models are used.