Upper confidence tree-based consistent reactive planning application to minesweeper

Authors:
Michèle Sebag;Olivier Teytaud
Affiliations:
TAO-INRIA, LRI, CNRS UMR 8623, Université Paris-Sud, Orsay, France;OASE Lab, National University of Tainan, Taiwan
Venue:
LION'12 Proceedings of the 6th international conference on Learning and Intelligent Optimization
Year:
2012

Citing 15
Cited 0

Genetic programming II: automatic discovery of reusable programs

Genetic programming II: automatic discovery of reusable programs
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Finite-time Analysis of the Multiarmed Bandit Problem

Machine Learning
Teaching with games: the Minesweeper and Asteroids experience

Journal of Computing Sciences in Colleges
Using confidence bounds for exploitation-exploration trade-offs

The Journal of Machine Learning Research
Minesweeper as an NP-complete problem

ACM SIGCSE Bulletin
Optimal robust expensive optimization is tractable

Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Boosting Active Learning to Optimality: A Tractable Monte-Carlo, Billiard-Based Algorithm

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Learning Minesweeper with multirelational learning

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Minesweeper for Sensor Networks--Making Event Detection in Sensor Networks Dependable

CSE '09 Proceedings of the 2009 International Conference on Computational Science and Engineering - Volume 01
Efficient selectivity and backup operators in Monte-Carlo tree search

CG'06 Proceedings of the 5th international conference on Computers and games
Bandit based monte-carlo planning

ECML'06 Proceedings of the 17th European conference on Machine Learning
Consistent Belief State Estimation, with Application to Mines

TAAI '11 Proceedings of the 2011 International Conference on Technologies and Applications of Artificial Intelligence
Continuous upper confidence trees

LION'05 Proceedings of the 5th international conference on Learning and Intelligent Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many reactive planning tasks are tackled through myopic optimization-based approaches. Specifically, the problem is simplified by only considering the observations available at the current time step and an estimate of the future system behavior; the optimal decision on the basis of this information is computed and the simplified problem description is updated on the basis of the new observations available in each time step. While this approach does not yield optimal strategies stricto sensu, it indeed gives good results at a reasonable computational cost for highly intractable problems, whenever fast off-the-shelf solvers are available for the simplified problem. The increase of available computational power − even though the search for optimal strategies remains intractable with brute-force approaches − makes it however possible to go beyond the intrinsic limitations of myopic reactive planning approaches. A consistent reactive planning approach is proposed in this paper, embedding a solver with an Upper Confidence Tree algorithm. While the solver is used to yield a consistent estimate of the belief state, the UCT exploits this estimate (both in the tree nodes and through the Monte-Carlo simulator) to achieve an asymptotically optimal policy. The paper shows the consistency of the proposed Upper Confidence Tree-based Consistent Reactive Planning algorithm and presents a proof of principle of its performance on a classical success of the myopic approach, the MineSweeper game.