Learning to search: Functional gradient techniques for imitation learning

Authors:
Nathan D. Ratliff;David Silver;J. Andrew Bagnell
Affiliations:
Robotics Institute, Carnegie Mellon University, Pittsburgh, USA 15213;Robotics Institute, Carnegie Mellon University, Pittsburgh, USA 15213;Robotics Institute and Machine Learning, Carnegie Mellon University, Pittsburgh, USA 15213
Venue:
Autonomous Robots
Year:
2009

Citing 19
Cited 5

Minimization methods for non-differentiable functions

Minimization methods for non-differentiable functions
Optimal control: linear quadratic methods

Optimal control: linear quadratic methods
ALVINN: an autonomous land vehicle in a neural network

Advances in neural information processing systems 1
Exponentiated gradient versus gradient descent for linear predictors

Information and Computation
Locally Weighted Learning

Artificial Intelligence Review - Special issue on lazy learning
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Algorithms for Inverse Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A Framework for Behavioural Cloning

Machine Intelligence 15, Intelligent Agents [St. Catherine's College, Oxford, July 1995]
Approximate solutions to markov decision processes

Approximate solutions to markov decision processes
Apprenticeship learning via inverse reinforcement learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Boosting as a Regularized Path to a Maximum Margin Classifier

The Journal of Machine Learning Research
Learning structured prediction models: a large margin approach

ICML '05 Proceedings of the 22nd international conference on Machine learning
Prediction, Learning, and Games

Prediction, Learning, and Games
Maximum margin planning

ICML '06 Proceedings of the 23rd international conference on Machine learning
A survey of robot learning from demonstration

Robotics and Autonomous Systems
Maximum entropy inverse reinforcement learning

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Dynamical System Modulation for Robot Learning via Kinesthetic Demonstrations

IEEE Transactions on Robotics
On Learning, Representing, and Generalizing a Task in a Humanoid Robot

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Greed is good: algorithmic results for sparse approximation

IEEE Transactions on Information Theory

Learning from Demonstration for Autonomous Navigation in Complex Unstructured Terrain

International Journal of Robotics Research
Stacked hierarchical labeling

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part VI
Optimization and learning for rough terrain legged locomotion

International Journal of Robotics Research
Imitation learning in relational domains: a functional-gradient boosting approach

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Apprenticeship learning with few examples

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Programming robot behavior remains a challenging task. While it is often easy to abstractly define or even demonstrate a desired behavior, designing a controller that embodies the same behavior is difficult, time consuming, and ultimately expensive. The machine learning paradigm offers the promise of enabling "programming by demonstration" for developing high-performance robotic systems. Unfortunately, many "behavioral cloning" (Bain and Sammut in Machine intelligence agents. London: Oxford University Press, 1995; Pomerleau in Advances in neural information processing systems 1, 1989; LeCun et al. in Advances in neural information processing systems 18, 2006) approaches that utilize classical tools of supervised learning (e.g. decision trees, neural networks, or support vector machines) do not fit the needs of modern robotic systems. These systems are often built atop sophisticated planning algorithms that efficiently reason far into the future; consequently, ignoring these planning algorithms in lieu of a supervised learning approach often leads to myopic and poor-quality robot performance.While planning algorithms have shown success in many real-world applications ranging from legged locomotion (Chestnutt et al. in Proceedings of the IEEE-RAS international conference on humanoid robots, 2003) to outdoor unstructured navigation (Kelly et al. in Proceedings of the international symposium on experimental robotics (ISER), 2004; Stentz et al. in AUVSI's unmanned systems, 2007), such algorithms rely on fully specified cost functions that map sensor readings and environment models to quantifiable costs. Such cost functions are usually manually designed and programmed. Recently, a set of techniques has been developed that explore learning these functions from expert human demonstration. These algorithms apply an inverse optimal control approach to find a cost function for which planned behavior mimics an expert's demonstration.The work we present extends the Maximum Margin Planning (MMP) (Ratliff et al. in Twenty second international conference on machine learning (ICML06), 2006a) framework to admit learning of more powerful, non-linear cost functions. These algorithms, known collectively as LEARCH (LEArning to seaRCH), are simpler to implement than most existing methods, more efficient than previous attempts at non-linearization (Ratliff et al. in NIPS, 2006b), more naturally satisfy common constraints on the cost function, and better represent our prior beliefs about the function's form. We derive and discuss the framework both mathematically and intuitively, and demonstrate practical real-world performance with three applied case-studies including legged locomotion, grasp planning, and autonomous outdoor unstructured navigation. The latter study includes hundreds of kilometers of autonomous traversal through complex natural environments. These case-studies address key challenges in applying the algorithm in practical settings that utilize state-of-the-art planners, and which may be constrained by efficiency requirements and imperfect expert demonstration.