Algorithms for Inverse Reinforcement Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A POMDP formulation of preference elicitation problems
Eighteenth national conference on Artificial intelligence
Optimal learning: computational procedures for bayes-adaptive markov decision processes
Optimal learning: computational procedures for bayes-adaptive markov decision processes
Apprenticeship learning via inverse reinforcement learning
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Preference learning with Gaussian processes
ICML '05 Proceedings of the 22nd international conference on Machine learning
An analytic solution to discrete Bayesian reinforcement learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
Preference elicitation and generalized additive utility
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Bayesian inverse reinforcement learning
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Modular models of task based visually guided behavior
Modular models of task based visually guided behavior
Robust bayesian reinforcement learning through tight lower bounds
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Bayesian multitask inverse reinforcement learning
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Proceedings of the 2nd ACM international conference on High confidence networked systems
Hi-index | 0.00 |
We state the problem of inverse reinforcement learning in terms of preference elicitation, resulting in a principled (Bayesian) statistical formulation. This generalises previous work on Bayesian inverse reinforcement learning and allows us to obtain a posterior distribution on the agent's preferences, policy and optionally, the obtained reward sequence, from observations. We examine the relation of the resulting approach to other statistical methods for inverse reinforcement learning via analysis and experimental results. We show that preferences can be determined accurately, even if the observed agent's policy is sub-optimal with respect to its own preferences. In that case, significantly improved policies with respect to the agent's preferences are obtained, compared to both other methods and to the performance of the demonstrated policy.