Proceedings of the seventh international conference (1990) on Machine learning
Exploration bonuses and dual control
Machine Learning
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Efficient Bayesian parameter estimation in large discrete domains
Proceedings of the 1998 conference on Advances in neural information processing systems II
Complexity of finite-horizon Markov decision process problems
Journal of the ACM (JACM)
Introduction to Stochastic Dynamic Programming: Probability and Mathematical
Introduction to Stochastic Dynamic Programming: Probability and Mathematical
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Finite-time Analysis of the Multiarmed Bandit Problem
Machine Learning
A Bayesian Framework for Reinforcement Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
On the undecidability of probabilistic planning and related stochastic optimization problems
Artificial Intelligence - special issue on planning with uncertainty and incomplete information
Optimal learning: computational procedures for bayes-adaptive markov decision processes
Optimal learning: computational procedures for bayes-adaptive markov decision processes
R-max - a general polynomial time algorithm for near-optimal reinforcement learning
The Journal of Machine Learning Research
Bayesian sparse sampling for on-line reward optimization
ICML '05 Proceedings of the 22nd international conference on Machine learning
An analytic solution to discrete Bayesian reinforcement learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
Combining online and offline knowledge in UCT
Proceedings of the 24th international conference on Machine learning
Sample-based learning and search with permanent and transient memories
Proceedings of the 25th international conference on Machine learning
Near-Bayesian exploration in polynomial time
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Online planning algorithms for POMDPs
Journal of Artificial Intelligence Research
A sparse sampling algorithm for near-optimal planning in large Markov decision processes
IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Using linear programming for Bayesian exploration in Markov decision processes
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Reinforcement Learning in Finite MDPs: PAC Analysis
The Journal of Machine Learning Research
A Bayesian sampling approach to exploration in reinforcement learning
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Pure exploration in multi-armed bandits problems
ALT'09 Proceedings of the 20th international conference on Algorithmic learning theory
Algorithms for Reinforcement Learning
Algorithms for Reinforcement Learning
Near-optimal Regret Bounds for Reinforcement Learning
The Journal of Machine Learning Research
Smarter sampling in model-based Bayesian reinforcement learning
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes
The Journal of Machine Learning Research
Bandit based monte-carlo planning
ECML'06 Proceedings of the 17th European conference on Machine Learning
The grand challenge of computer Go: Monte Carlo tree search and extensions
Communications of the ACM
Bayesian policy search with policy priors
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Monte Carlo Tree Search for Bayesian Reinforcement Learning
ICMLA '12 Proceedings of the 2012 11th International Conference on Machine Learning and Applications - Volume 01
Hi-index | 0.00 |
Bayesian planning is a formally elegant approach to learning optimal behaviour under model uncertainty, trading off exploration and exploitation in an ideal way. Unfortunately, planning optimally in the face of uncertainty is notoriously taxing, since the search space is enormous. In this paper we introduce a tractable, sample-based method for approximate Bayes-optimal planning which exploits Monte-Carlo tree search. Our approach avoids expensive applications of Bayes rule within the search tree by sampling models from current beliefs, and furthermore performs this sampling in a lazy manner. This enables it to outperform previous Bayesian model-based reinforcement learning algorithms by a significant margin on several well-known benchmark problems. As we show, our approach can even work in problems with an in finite state space that lie qualitatively out of reach of almost all previous work in Bayesian exploration.