AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Exploration Control in Reinforcement Learning using Optimistic Model Selection
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Near-Optimal Reinforcement Learning in Polynominal Time
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Algorithms for Inverse Reinforcement Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A Bayesian Framework for Reinforcement Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Learning in embedded systems
Optimal learning: computational procedures for bayes-adaptive markov decision processes
Optimal learning: computational procedures for bayes-adaptive markov decision processes
R-max - a general polynomial time algorithm for near-optimal reinforcement learning
The Journal of Machine Learning Research
The Linear Programming Approach to Approximate Dynamic Programming
Operations Research
Apprenticeship learning via inverse reinforcement learning
ICML '04 Proceedings of the twenty-first international conference on Machine learning
On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming
Mathematics of Operations Research
Bayesian sparse sampling for on-line reward optimization
ICML '05 Proceedings of the 22nd international conference on Machine learning
An analytic solution to discrete Bayesian reinforcement learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
Pathwise Stochastic Optimal Control
SIAM Journal on Control and Optimization
An analysis of model-based Interval Estimation for Markov Decision Processes
Journal of Computer and System Sciences
Reinforcement Learning in Finite MDPs: PAC Analysis
The Journal of Machine Learning Research
A Bayesian sampling approach to exploration in reinforcement learning
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Information Relaxations and Duality in Stochastic Dynamic Programs
Operations Research
Near-optimal Regret Bounds for Reinforcement Learning
The Journal of Machine Learning Research
Smarter sampling in model-based Bayesian reinforcement learning
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Preference elicitation and inverse reinforcement learning
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Model based Bayesian exploration
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Multi-Task reinforcement learning: shaping and feature selection
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Bayesian multitask inverse reinforcement learning
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Bayesian multitask inverse reinforcement learning
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Linear Bayesian reinforcement learning
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
In the Bayesian approach to sequential decision making, exact calculation of the (subjective) utility is intractable. This extends to most special cases of interest, such as reinforcement learning problems. While utility bounds are known to exist for this problem, so far none of them were particularly tight. In this paper, we show how to efficiently calculate a lower bound, which corresponds to the utility of a near-optimal memoryless policy for the decision problem, which is generally different from both the Bayes-optimal policy and the policy which is optimal for the expected MDP under the current belief. We then show how these can be applied to obtain robust exploration policies in a Bayesian reinforcement learning setting.