Apprenticeship learning using linear programming

Authors:
Umar Syed;Michael Bowling;Robert E. Schapire
Affiliations:
Princeton University, Princeton, NJ;University of Alberta, Alberta, Canada;Princeton University, Princeton, NJ
Venue:
Proceedings of the 25th international conference on Machine learning
Year:
2008

Citing 5
Cited 11

Matrix analysis

Matrix analysis
Linear optimization and extensions: theory and algorithms

Linear optimization and extensions: theory and algorithms
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Apprenticeship learning via inverse reinforcement learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Maximum margin planning

ICML '06 Proceedings of the 23rd international conference on Machine learning

Active Learning for Reward Estimation in Inverse Reinforcement Learning

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Learning behavior styles with inverse reinforcement learning

ACM SIGGRAPH 2010 papers
Analysis of Inverse Reinforcement Learning with Perturbed Demonstrations

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Learning from demonstration using MDP induced metrics

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Inverse Reinforcement Learning in Partially Observable Environments

The Journal of Machine Learning Research
Batch, off-policy and model-free apprenticeship learning

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Besting the quiz master: crowdsourcing incremental classification games

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Structured apprenticeship learning

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Apprenticeship learning with few examples

Neurocomputing
Scenario Trees and Policy Selection for Multistage Stochastic Programming Using Machine Learning

INFORMS Journal on Computing
Bayesian nonparametric feature construction for inverse reinforcement learning

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In apprenticeship learning, the goal is to learn a policy in a Markov decision process that is at least as good as a policy demonstrated by an expert. The difficulty arises in that the MDP's true reward function is assumed to be unknown. We show how to frame apprenticeship learning as a linear programming problem, and show that using an off-the-shelf LP solver to solve this problem results in a substantial improvement in running time over existing methods---up to two orders of magnitude faster in our experiments. Additionally, our approach produces stationary policies, while all existing methods for apprenticeship learning output policies that are "mixed", i.e. randomized combinations of stationary policies. The technique used is general enough to convert any mixed policy to a stationary policy.