Improving the exploration in upper confidence trees

Authors:
Adrien Couëtoux;Hassen Doghmen;Olivier Teytaud
Affiliations:
TAO-INRIA, LRI, CNRS UMR 8623, Université Paris-Sud, Orsay, France,OASE Lab, National University of Tainan, Taiwan,Artelys, Paris, France;TAO-INRIA, LRI, CNRS UMR 8623, Université Paris-Sud, Orsay, France;TAO-INRIA, LRI, CNRS UMR 8623, Université Paris-Sud, Orsay, France,OASE Lab, National University of Tainan, Taiwan
Venue:
LION'12 Proceedings of the 6th international conference on Learning and Intelligent Optimization
Year:
2012

Citing 4
Cited 0

Neuro-Dynamic Programming

Neuro-Dynamic Programming
Combining online and offline knowledge in UCT

Proceedings of the 24th international conference on Machine learning
Efficient selectivity and backup operators in Monte-Carlo tree search

CG'06 Proceedings of the 5th international conference on Computers and games
Continuous upper confidence trees

LION'05 Proceedings of the 5th international conference on Learning and Intelligent Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the standard version of the UCT algorithm, in the case of a continuous set of decisions, the exploration of new decisions is done through blind search. This can lead to very inefficient exploration, particularly in the case of large dimension problems, which often happens in energy management problems, for instance. In an attempt to use the information gathered through past simulations to better explore new decisions, we propose a method named Blind Value (BV). It only requires the access to a function that randomly draws feasible decisions. We also implement it and compare it to the original version of continuous UCT. Our results show that it gives a significant increase in convergence speed, in dimensions 12 and 80.