Matrix analysis
Linear optimization and extensions: theory and algorithms
Linear optimization and extensions: theory and algorithms
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Apprenticeship learning via inverse reinforcement learning
ICML '04 Proceedings of the twenty-first international conference on Machine learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
Active Learning for Reward Estimation in Inverse Reinforcement Learning
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Learning behavior styles with inverse reinforcement learning
ACM SIGGRAPH 2010 papers
Analysis of Inverse Reinforcement Learning with Perturbed Demonstrations
Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Learning from demonstration using MDP induced metrics
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Inverse Reinforcement Learning in Partially Observable Environments
The Journal of Machine Learning Research
Batch, off-policy and model-free apprenticeship learning
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Besting the quiz master: crowdsourcing incremental classification games
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Structured apprenticeship learning
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Apprenticeship learning with few examples
Neurocomputing
Scenario Trees and Policy Selection for Multistage Stochastic Programming Using Machine Learning
INFORMS Journal on Computing
Bayesian nonparametric feature construction for inverse reinforcement learning
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
In apprenticeship learning, the goal is to learn a policy in a Markov decision process that is at least as good as a policy demonstrated by an expert. The difficulty arises in that the MDP's true reward function is assumed to be unknown. We show how to frame apprenticeship learning as a linear programming problem, and show that using an off-the-shelf LP solver to solve this problem results in a substantial improvement in running time over existing methods---up to two orders of magnitude faster in our experiments. Additionally, our approach produces stationary policies, while all existing methods for apprenticeship learning output policies that are "mixed", i.e. randomized combinations of stationary policies. The technique used is general enough to convert any mixed policy to a stationary policy.