Genetic programming II: automatic discovery of reusable programs
Genetic programming II: automatic discovery of reusable programs
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Neuro-Dynamic Programming
Finite-time Analysis of the Multiarmed Bandit Problem
Machine Learning
Teaching with games: the Minesweeper and Asteroids experience
Journal of Computing Sciences in Colleges
Using confidence bounds for exploitation-exploration trade-offs
The Journal of Machine Learning Research
Minesweeper as an NP-complete problem
ACM SIGCSE Bulletin
Optimal robust expensive optimization is tractable
Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Boosting Active Learning to Optimality: A Tractable Monte-Carlo, Billiard-Based Algorithm
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Learning Minesweeper with multirelational learning
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Minesweeper for Sensor Networks--Making Event Detection in Sensor Networks Dependable
CSE '09 Proceedings of the 2009 International Conference on Computational Science and Engineering - Volume 01
Efficient selectivity and backup operators in Monte-Carlo tree search
CG'06 Proceedings of the 5th international conference on Computers and games
Bandit based monte-carlo planning
ECML'06 Proceedings of the 17th European conference on Machine Learning
Consistent Belief State Estimation, with Application to Mines
TAAI '11 Proceedings of the 2011 International Conference on Technologies and Applications of Artificial Intelligence
Continuous upper confidence trees
LION'05 Proceedings of the 5th international conference on Learning and Intelligent Optimization
Hi-index | 0.00 |
Many reactive planning tasks are tackled through myopic optimization-based approaches. Specifically, the problem is simplified by only considering the observations available at the current time step and an estimate of the future system behavior; the optimal decision on the basis of this information is computed and the simplified problem description is updated on the basis of the new observations available in each time step. While this approach does not yield optimal strategies stricto sensu, it indeed gives good results at a reasonable computational cost for highly intractable problems, whenever fast off-the-shelf solvers are available for the simplified problem. The increase of available computational power − even though the search for optimal strategies remains intractable with brute-force approaches − makes it however possible to go beyond the intrinsic limitations of myopic reactive planning approaches. A consistent reactive planning approach is proposed in this paper, embedding a solver with an Upper Confidence Tree algorithm. While the solver is used to yield a consistent estimate of the belief state, the UCT exploits this estimate (both in the tree nodes and through the Monte-Carlo simulator) to achieve an asymptotically optimal policy. The paper shows the consistency of the proposed Upper Confidence Tree-based Consistent Reactive Planning algorithm and presents a proof of principle of its performance on a classical success of the myopic approach, the MineSweeper game.