Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
A Bayesian Framework for Reinforcement Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Optimal learning: computational procedures for bayes-adaptive markov decision processes
Optimal learning: computational procedures for bayes-adaptive markov decision processes
R-max - a general polynomial time algorithm for near-optimal reinforcement learning
The Journal of Machine Learning Research
An analytic solution to discrete Bayesian reinforcement learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
An intrinsic reward mechanism for efficient exploration
ICML '06 Proceedings of the 23rd international conference on Machine learning
Probabilistic distance measures of the Dirichlet and Beta distributions
Pattern Recognition
Near-Bayesian exploration in polynomial time
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Tree Exploration for Bayesian RL Exploration
CIMCA '08 Proceedings of the 2008 International Conference on Computational Intelligence for Modelling Control & Automation
Active learning of dynamic Bayesian networks in Markov decision processes
SARA'07 Proceedings of the 7th International conference on Abstraction, reformulation, and approximation
A Bayesian sampling approach to exploration in reinforcement learning
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Hi-index | 0.00 |
We consider the active learning problem of inferring the transition model of a Markov Decision Process by acting and observing transitions. This is particularly useful when no reward function is a priori defined. Our proposal is to cast the active learning task as a utility maximization problem using Bayesian reinforcement learning with belief-dependent rewards. After presenting three possible performance criteria, we derive from them the belief-dependent rewards to be used in the decision-making process. As computing the optimal Bayesian value function is intractable for large horizons, we use a simple algorithm to approximately solve this optimization problem. Despite the sub-optimality of this technique, we show experimentally that our proposal is efficient in a number of domains.