Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Near-Optimal Reinforcement Learning in Polynominal Time
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Efficient Exploration In Reinforcement Learning
Efficient Exploration In Reinforcement Learning
Optimal learning: computational procedures for bayes-adaptive markov decision processes
Optimal learning: computational procedures for bayes-adaptive markov decision processes
The Linear Programming Approach to Approximate Dynamic Programming
Operations Research
On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming
Mathematics of Operations Research
A theoretical analysis of Model-Based Interval Estimation
ICML '05 Proceedings of the 22nd international conference on Machine learning
Bayesian sparse sampling for on-line reward optimization
ICML '05 Proceedings of the 22nd international conference on Machine learning
R-MAX: a general polynomial time algorithm for near-optimal reinforcement learning
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Model based Bayesian exploration
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Bayesian reinforcement learning in continuous pomdps with Gaussian processes
IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
Provably Efficient Learning with Typed Parametric Models
The Journal of Machine Learning Research
MICAI'07 Proceedings of the artificial intelligence 6th Mexican international conference on Advances in artificial intelligence
A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes
The Journal of Machine Learning Research
Monte-Carlo tree search for Bayesian reinforcement learning
Applied Intelligence
Scalable and efficient bayes-adaptive reinforcement learning based on monte-carlo tree search
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
A key problem in reinforcement learning is finding a good balance between the need to explore the environment and the need to gain rewards by exploiting existing knowledge. Much research has been devoted to this topic, and many of the proposed methods are aimed simply at ensuring that enough samples are gathered to estimate well the value function. In contrast, [Bellman and Kalaba, 1959] proposed constructing a representation in which the states of the original system are paired with knowledge about the current model. Hence, knowledge about the possible Markov models of the environment is represented and maintained explicitly. Unfortunately, this approach is intractable except for bandit problems (where it gives rise to Gittins indices, an optimal exploration method). In this paper, we explore ideas for making this method computationally tractable. We maintain a model of the environment as a Markov Decision Process. We sample finite-length trajectories from the infinite tree using ideas based on sparse sampling. Finding the values of the nodes of this sparse subtree can then be expressed as an optimization problem, which we solve using Linear Programming. We illustrate this approach on a few domains and compare it with other exploration algorithms.