Minimization methods for non-differentiable functions
Minimization methods for non-differentiable functions
Optimal control: linear quadratic methods
Optimal control: linear quadratic methods
ALVINN: an autonomous land vehicle in a neural network
Advances in neural information processing systems 1
Exponentiated gradient versus gradient descent for linear predictors
Information and Computation
Artificial Intelligence Review - Special issue on lazy learning
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Algorithms for Inverse Reinforcement Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A Framework for Behavioural Cloning
Machine Intelligence 15, Intelligent Agents [St. Catherine's College, Oxford, July 1995]
Approximate solutions to markov decision processes
Approximate solutions to markov decision processes
Apprenticeship learning via inverse reinforcement learning
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Boosting as a Regularized Path to a Maximum Margin Classifier
The Journal of Machine Learning Research
Learning structured prediction models: a large margin approach
ICML '05 Proceedings of the 22nd international conference on Machine learning
Prediction, Learning, and Games
Prediction, Learning, and Games
ICML '06 Proceedings of the 23rd international conference on Machine learning
A survey of robot learning from demonstration
Robotics and Autonomous Systems
Maximum entropy inverse reinforcement learning
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Dynamical System Modulation for Robot Learning via Kinesthetic Demonstrations
IEEE Transactions on Robotics
On Learning, Representing, and Generalizing a Task in a Humanoid Robot
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Greed is good: algorithmic results for sparse approximation
IEEE Transactions on Information Theory
Learning from Demonstration for Autonomous Navigation in Complex Unstructured Terrain
International Journal of Robotics Research
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part VI
Optimization and learning for rough terrain legged locomotion
International Journal of Robotics Research
Imitation learning in relational domains: a functional-gradient boosting approach
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Apprenticeship learning with few examples
Neurocomputing
Hi-index | 0.00 |
Programming robot behavior remains a challenging task. While it is often easy to abstractly define or even demonstrate a desired behavior, designing a controller that embodies the same behavior is difficult, time consuming, and ultimately expensive. The machine learning paradigm offers the promise of enabling "programming by demonstration" for developing high-performance robotic systems. Unfortunately, many "behavioral cloning" (Bain and Sammut in Machine intelligence agents. London: Oxford University Press, 1995; Pomerleau in Advances in neural information processing systems 1, 1989; LeCun et al. in Advances in neural information processing systems 18, 2006) approaches that utilize classical tools of supervised learning (e.g. decision trees, neural networks, or support vector machines) do not fit the needs of modern robotic systems. These systems are often built atop sophisticated planning algorithms that efficiently reason far into the future; consequently, ignoring these planning algorithms in lieu of a supervised learning approach often leads to myopic and poor-quality robot performance.While planning algorithms have shown success in many real-world applications ranging from legged locomotion (Chestnutt et al. in Proceedings of the IEEE-RAS international conference on humanoid robots, 2003) to outdoor unstructured navigation (Kelly et al. in Proceedings of the international symposium on experimental robotics (ISER), 2004; Stentz et al. in AUVSI's unmanned systems, 2007), such algorithms rely on fully specified cost functions that map sensor readings and environment models to quantifiable costs. Such cost functions are usually manually designed and programmed. Recently, a set of techniques has been developed that explore learning these functions from expert human demonstration. These algorithms apply an inverse optimal control approach to find a cost function for which planned behavior mimics an expert's demonstration.The work we present extends the Maximum Margin Planning (MMP) (Ratliff et al. in Twenty second international conference on machine learning (ICML06), 2006a) framework to admit learning of more powerful, non-linear cost functions. These algorithms, known collectively as LEARCH (LEArning to seaRCH), are simpler to implement than most existing methods, more efficient than previous attempts at non-linearization (Ratliff et al. in NIPS, 2006b), more naturally satisfy common constraints on the cost function, and better represent our prior beliefs about the function's form. We derive and discuss the framework both mathematically and intuitively, and demonstrate practical real-world performance with three applied case-studies including legged locomotion, grasp planning, and autonomous outdoor unstructured navigation. The latter study includes hundreds of kilometers of autonomous traversal through complex natural environments. These case-studies address key challenges in applying the algorithm in practical settings that utilize state-of-the-art planners, and which may be constrained by efficiency requirements and imperfect expert demonstration.