Near-Optimal Reinforcement Learning in Polynomial Time
Machine Learning
Optimal learning: computational procedures for bayes-adaptive markov decision processes
Optimal learning: computational procedures for bayes-adaptive markov decision processes
R-max - a general polynomial time algorithm for near-optimal reinforcement learning
The Journal of Machine Learning Research
An analytic solution to discrete Bayesian reinforcement learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
Near-Bayesian exploration in polynomial time
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
A Bayesian sampling approach to exploration in reinforcement learning
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Hi-index | 0.00 |
An important subclass of reinforcement learning problems are those that exhibit only discrete uncertainty: the agent's environment is known to be sampled from a finite set of possible worlds. In contrast to generic reinforcement learning problems, it is possible to efficiently compute the Bayes-optimal policy for many discrete uncertainty RL domains. We demonstrate empirically that the Bayes-optimal policy can result in substantially and significantly improved performance relative to a state of the art probably approximately correct RL algorithm. Our second contribution is to bound the error of using slightly noisy estimates of the discrete set of possible Markov decision process parameters during learning. We suggest that this is an important and probable situation, given such models will often be constructed from finite sets of noisy, real-world data. We demonstrate good empirical performance on a simulated machine repair problem when using noisy parameter estimates.