Continuous upper confidence trees

Authors:
Adrien Couëtoux;Jean-Baptiste Hoock;Nataliya Sokolovska;Olivier Teytaud;Nicolas Bonnard
Affiliations:
TAO-INRIA, LRI, CNRS UMR 8623, Université Paris-Sud, Orsay, France;TAO-INRIA, LRI, CNRS UMR 8623, Université Paris-Sud, Orsay, France;TAO-INRIA, LRI, CNRS UMR 8623, Université Paris-Sud, Orsay, France;TAO-INRIA, LRI, CNRS UMR 8623, Université Paris-Sud, Orsay, France;Artelys, France
Venue:
LION'05 Proceedings of the 5th international conference on Learning and Intelligent Optimization
Year:
2011

Citing 7
Cited 4

Using confidence bounds for exploitation-exploration trade-offs

The Journal of Machine Learning Research
Combining online and offline knowledge in UCT

Proceedings of the 24th international conference on Machine learning
Optimal robust expensive optimization is tractable

Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Simulation-based approach to general game playing

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 1
Monte-Carlo exploration for deterministic planning

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Efficient selectivity and backup operators in Monte-Carlo tree search

CG'06 Proceedings of the 5th international conference on Computers and games
Bandit based monte-carlo planning

ECML'06 Proceedings of the 17th European conference on Machine Learning

Sparse gradient-based direct policy search

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part IV
Upper confidence tree-based consistent reactive planning application to minesweeper

LION'12 Proceedings of the 6th international conference on Learning and Intelligent Optimization
Improving the exploration in upper confidence trees

LION'12 Proceedings of the 6th international conference on Learning and Intelligent Optimization
Monte Carlo *-minimax search

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Upper Confidence Trees are a very efficient tool for solving Markov Decision Processes; originating in difficult games like the game of Go, it is in particular surprisingly efficient in high dimensional problems. It is known that it can be adapted to continuous domains in some cases (in particular continuous action spaces). We here present an extension of Upper Confidence Trees to continuous stochastic problems. We (i) show a deceptive problem on which the classical Upper Confidence Tree approach does not work, even with arbitrarily large computational power and with progressive widening (ii) propose an improvement, termed double-progressive widening, which takes care of the compromise between variance (we want infinitely many simulations for each action/state) and bias (we want sufficiently many nodes to avoid a bias by the first nodes) and which extends the classical progressive widening (iii) discuss its consistency and show experimentally that it performs well on the deceptive problem and on experimental benchmarks. We guess that the double-progressive widening trick can be used for other algorithms as well, as a general tool for ensuring a good bias/variance compromise in search algorithms.