Near-Bayesian exploration in polynomial time

Authors:
J. Zico Kolter;Andrew Y. Ng
Affiliations:
Stanford University, CA;Stanford University, CA
Venue:
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Year:
2009

Citing 11
Cited 16

Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Near-Optimal Reinforcement Learning in Polynomial Time

Machine Learning
A Bayesian Framework for Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Efficient Reinforcement Learning in Factored MDPs

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
Bayesian sparse sampling for on-line reward optimization

ICML '05 Proceedings of the 22nd international conference on Machine learning
An analytic solution to discrete Bayesian reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
PAC model-free reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
An analysis of model-based Interval Estimation for Markov Decision Processes

Journal of Computer and System Sciences
Near-Bayesian exploration in polynomial time

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Model based Bayesian exploration

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Near-Bayesian exploration in polynomial time

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
PAC-MDP learning with knowledge-based admissible models

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Smarter sampling in model-based Bayesian reinforcement learning

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Reducing reinforcement learning to KWIK online regression

Annals of Mathematics and Artificial Intelligence
A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes

The Journal of Machine Learning Research
Efficient planning in R-max

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs

Artificial Intelligence
Active learning of MDP models

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Bayes-optimal reinforcement learning for discrete uncertainty domains

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Learning exploration strategies in model-based reinforcement learning

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Exploration in relational domains for model-based reinforcement learning

The Journal of Machine Learning Research
Efficient learning in linearly solvable MDP models

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Prior-free exploration bonus for and beyond near bayes-optimal behavior

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Linear Bayesian reinforcement learning

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Monte-Carlo tree search for Bayesian reinforcement learning

Applied Intelligence
Scalable and efficient bayes-adaptive reinforcement learning based on monte-carlo tree search

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the exploration/exploitation problem in reinforcement learning (RL). The Bayesian approach to model-based RL offers an elegant solution to this problem, by considering a distribution over possible models and acting to maximize expected reward; unfortunately, the Bayesian solution is intractable for all but very restricted cases. In this paper we present a simple algorithm, and prove that with high probability it is able to perform ε-close to the true (intractable) optimal Bayesian policy after some small (polynomial in quantities describing the system) number of time steps. The algorithm and analysis are motivated by the so-called PAC-MDP approach, and extend such results into the setting of Bayesian RL. In this setting, we show that we can achieve lower sample complexity bounds than existing algorithms, while using an exploration strategy that is much greedier than the (extremely cautious) exploration of PAC-MDP algorithms.