Maximum entropy inverse reinforcement learning

Authors:
Brian D. Ziebart;Andrew Maas;J. Andrew Bagnell;Anind K. Dey
Affiliations:
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA;School of Computer Science, Carnegie Mellon University, Pittsburgh, PA;School of Computer Science, Carnegie Mellon University, Pittsburgh, PA;School of Computer Science, Carnegie Mellon University, Pittsburgh, PA
Venue:
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Year:
2008

Citing 9
Cited 28

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Algorithms for Inverse Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Apprenticeship learning via inverse reinforcement learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Maximum margin planning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Learning and inferring transportation routines

Artificial Intelligence
Trip router with individualized preferences (TRIP): incorporating personalization into route planning

IAAI'06 Proceedings of the 18th conference on Innovative applications of artificial intelligence - Volume 2
Bayesian inverse reinforcement learning

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Maximum entropy distribution estimation with generalized regularization

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Predestination: inferring destinations from partial trajectories

UbiComp'06 Proceedings of the 8th international conference on Ubiquitous Computing

Navigate like a cabbie: probabilistic reasoning from observed context-aware behavior

UbiComp '08 Proceedings of the 10th international conference on Ubiquitous computing
A survey of robot learning from demonstration

Robotics and Autonomous Systems
Learning to search: Functional gradient techniques for imitation learning

Autonomous Robots
Active Learning for Reward Estimation in Inverse Reinforcement Learning

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Inverse reinforcement learning in partially observable environments

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Training parsers by inverse reinforcement learning

Machine Learning
Planning-based prediction for pedestrians

IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
Learning behavior styles with inverse reinforcement learning

ACM SIGGRAPH 2010 papers
Learning from demonstration using MDP induced metrics

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Human and robot perception in large-scale learning from demonstration

Proceedings of the 6th international conference on Human-robot interaction
Leveraging human behavior models to predict paths in indoor environments

Pervasive and Mobile Computing
Inverse Reinforcement Learning in Partially Observable Environments

The Journal of Machine Learning Research
A Bayesian nonparametric approach to modeling motion patterns

Autonomous Robots
Probabilistic pointing target prediction via inverse optimal control

Proceedings of the 2012 ACM international conference on Intelligent User Interfaces
Batch, off-policy and model-free apprenticeship learning

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Activity forecasting

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part IV
Bayesian nonparametric inverse reinforcement learning

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Structured apprenticeship learning

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Legibility and predictability of robot motion

Proceedings of the 8th ACM/IEEE international conference on Human-robot interaction
Hierarchical Bayesian Nonparametric Approach to Modeling and Learning the Wisdom of Crowds of Urban Traffic Route Planning Agents

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 02
Predictive indoor navigation using commercial smart-phones

Proceedings of the 28th Annual ACM Symposium on Applied Computing
Modeling and probabilistic reasoning of population evacuation during large-scale disaster

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
A policy-blending formalism for shared control

International Journal of Robotics Research
Probabilistic movement modeling for intention inference in human-robot interaction

International Journal of Robotics Research
Reinforcement learning in robotics: A survey

International Journal of Robotics Research
Bayesian nonparametric feature construction for inverse reinforcement learning

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Map matching with inverse reinforcement learning

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Familiarization to robot motion

Proceedings of the 2014 ACM/IEEE international conference on Human-robot interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent research has shown the benefit of framing problems of imitation learning as solutions to Markov Decision Problems. This approach reduces learning to the problem of recovering a utility function that makes the behavior induced by a near-optimal policy closely mimic demonstrated behavior. In this work, we develop a probabilistic approach based on the principle of maximum entropy. Our approach provides a well-defined, globally normalized distribution over decision sequences, while providing the same performance guarantees as existing methods. We develop our technique in the context of modeling real-world navigation and driving behaviors where collected data is inherently noisy and imperfect. Our probabilistic approach enables modeling of route preferences as well as a powerful new approach to inferring destinations and routes based on partial trajectories.