Maximum margin planning

Authors:
Nathan D. Ratliff;J. Andrew Bagnell;Martin A. Zinkevich
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;University of Alberta, Edmonton, Canada
Venue:
ICML '06 Proceedings of the 23rd international conference on Machine learning
Year:
2006

Citing 7
Cited 35

Minimization methods for non-differentiable functions

Minimization methods for non-differentiable functions
ALVINN: an autonomous land vehicle in a neural network

Advances in neural information processing systems 1
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Solving large scale linear prediction problems using stochastic gradient descent algorithms

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Apprenticeship learning via inverse reinforcement learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Learning structured prediction models: a large margin approach

ICML '05 Proceedings of the 22nd international conference on Machine learning
Logarithmic regret algorithms for online convex optimization

COLT'06 Proceedings of the 19th annual conference on Learning Theory

Learning for control from multiple demonstrations

Proceedings of the 25th international conference on Machine learning
Apprenticeship learning using linear programming

Proceedings of the 25th international conference on Machine learning
Imitation Learning Using Graphical Models

ECML '07 Proceedings of the 18th European conference on Machine Learning
A bayesian reinforcement learning approach for customizing human-robot interfaces

Proceedings of the 14th international conference on Intelligent user interfaces
A survey of robot learning from demonstration

Robotics and Autonomous Systems
Apprenticeship learning for helicopter control

Communications of the ACM - Barbara Liskov: ACM's A.M. Turing Award Winner
Learning to search: Functional gradient techniques for imitation learning

Autonomous Robots
Maximum entropy inverse reinforcement learning

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Training parsers by inverse reinforcement learning

Machine Learning
CHOMP: gradient optimization techniques for efficient motion planning

ICRA'09 Proceedings of the 2009 IEEE international conference on Robotics and Automation
Following directions using statistical machine translation

Proceedings of the 5th ACM/IEEE international conference on Human-robot interaction
Bundle Methods for Regularized Risk Minimization

The Journal of Machine Learning Research
Learning from Demonstration for Autonomous Navigation in Complex Unstructured Terrain

International Journal of Robotics Research
Learning from demonstration using MDP induced metrics

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Autonomous Helicopter Aerobatics through Apprenticeship Learning

International Journal of Robotics Research
Optimization and learning for rough terrain legged locomotion

International Journal of Robotics Research
The Stanford LittleDog: A learning and rapid replanning approach to quadruped locomotion

International Journal of Robotics Research
Inverse Reinforcement Learning in Partially Observable Environments

The Journal of Machine Learning Research
Probabilistic pointing target prediction via inverse optimal control

Proceedings of the 2012 ACM international conference on Intelligent User Interfaces
Structured Learning and Prediction in Computer Vision

Foundations and Trends® in Computer Graphics and Vision
Imitation learning in relational domains: a functional-gradient boosting approach

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Batch, off-policy and model-free apprenticeship learning

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Perceptron models for online structured prediction

IDEAL'12 Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning
Activity forecasting

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part IV
Bayesian nonparametric inverse reinforcement learning

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Structured apprenticeship learning

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Human behavior understanding for robotics

HBU'12 Proceedings of the Third international conference on Human Behavior Understanding
Learning the combinatorial structure of demonstrated behaviors with inverse feedback control

HBU'12 Proceedings of the Third international conference on Human Behavior Understanding
Apprenticeship learning with few examples

Neurocomputing
Legibility and predictability of robot motion

Proceedings of the 8th ACM/IEEE international conference on Human-robot interaction
A policy-blending formalism for shared control

International Journal of Robotics Research
CHOMP: Covariant Hamiltonian optimization for motion planning

International Journal of Robotics Research
Reinforcement learning in robotics: A survey

International Journal of Robotics Research
Bayesian nonparametric feature construction for inverse reinforcement learning

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Familiarization to robot motion

Proceedings of the 2014 ACM/IEEE international conference on Human-robot interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

Imitation learning of sequential, goal-directed behavior by standard supervised techniques is often difficult. We frame learning such behaviors as a maximum margin structured prediction problem over a space of policies. In this approach, we learn mappings from features to cost so an optimal policy in an MDP with these cost mimics the expert's behavior. Further, we demonstrate a simple, provably efficient approach to structured maximum margin learning, based on the subgradient method, that leverages existing fast algorithms for inference. Although the technique is general, it is particularly relevant in problems where A* and dynamic programming approaches make learning policies tractable in problems beyond the limitations of a QP formulation. We demonstrate our approach applied to route planning for outdoor mobile robots, where the behavior a designer wishes a planner to execute is often clear, while specifying cost functions that engender this behavior is a much more difficult task.