Scalable and efficient bayes-adaptive reinforcement learning based on monte-carlo tree search

Authors:
Arthur Guez;David Silver;Peter Dayan
Affiliations:
Gatsby Computational Neuroscience Unit, University College London, London, UK;Dept. of Computer Science, University College London, London, UK;Gatsby Computational Neuroscience Unit, University College London, London, UK
Venue:
Journal of Artificial Intelligence Research
Year:
2013

Citing 32
Cited 0

Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

Proceedings of the seventh international conference (1990) on Machine learning
Exploration bonuses and dual control

Machine Learning
Bayesian Q-learning

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty

Machine Learning
Efficient Bayesian parameter estimation in large discrete domains

Proceedings of the 1998 conference on Advances in neural information processing systems II
Complexity of finite-horizon Markov decision process problems

Journal of the ACM (JACM)
Introduction to Stochastic Dynamic Programming: Probability and Mathematical

Introduction to Stochastic Dynamic Programming: Probability and Mathematical
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Finite-time Analysis of the Multiarmed Bandit Problem

Machine Learning
A Bayesian Framework for Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
On the undecidability of probabilistic planning and related stochastic optimization problems

Artificial Intelligence - special issue on planning with uncertainty and incomplete information
Optimal learning: computational procedures for bayes-adaptive markov decision processes

Optimal learning: computational procedures for bayes-adaptive markov decision processes
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
Bayesian sparse sampling for on-line reward optimization

ICML '05 Proceedings of the 22nd international conference on Machine learning
An analytic solution to discrete Bayesian reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Combining online and offline knowledge in UCT

Proceedings of the 24th international conference on Machine learning
Sample-based learning and search with permanent and transient memories

Proceedings of the 25th international conference on Machine learning
Near-Bayesian exploration in polynomial time

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Online planning algorithms for POMDPs

Journal of Artificial Intelligence Research
A sparse sampling algorithm for near-optimal planning in large Markov decision processes

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Using linear programming for Bayesian exploration in Markov decision processes

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Reinforcement Learning in Finite MDPs: PAC Analysis

The Journal of Machine Learning Research
A Bayesian sampling approach to exploration in reinforcement learning

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Pure exploration in multi-armed bandits problems

ALT'09 Proceedings of the 20th international conference on Algorithmic learning theory
Algorithms for Reinforcement Learning

Algorithms for Reinforcement Learning
Near-optimal Regret Bounds for Reinforcement Learning

The Journal of Machine Learning Research
Smarter sampling in model-based Bayesian reinforcement learning

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes

The Journal of Machine Learning Research
Bandit based monte-carlo planning

ECML'06 Proceedings of the 17th European conference on Machine Learning
The grand challenge of computer Go: Monte Carlo tree search and extensions

Communications of the ACM
Bayesian policy search with policy priors

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Monte Carlo Tree Search for Bayesian Reinforcement Learning

ICMLA '12 Proceedings of the 2012 11th International Conference on Machine Learning and Applications - Volume 01

Quantified Score

Hi-index	0.00

Visualization

Abstract

Bayesian planning is a formally elegant approach to learning optimal behaviour under model uncertainty, trading off exploration and exploitation in an ideal way. Unfortunately, planning optimally in the face of uncertainty is notoriously taxing, since the search space is enormous. In this paper we introduce a tractable, sample-based method for approximate Bayes-optimal planning which exploits Monte-Carlo tree search. Our approach avoids expensive applications of Bayes rule within the search tree by sampling models from current beliefs, and furthermore performs this sampling in a lazy manner. This enables it to outperform previous Bayesian model-based reinforcement learning algorithms by a significant margin on several well-known benchmark problems. As we show, our approach can even work in problems with an in finite state space that lie qualitatively out of reach of almost all previous work in Bayesian exploration.