Probabilistic model-based imitation learning

Authors:
Peter Englert;Alexandros Paraschos;Marc Peter Deisenroth;Jan Peters
Affiliations:
Department of Computer Science, Technische Universität Darmstadt, Germany;Department of Computer Science, Technische Universität Darmstadt, Germany;Department of Computer Science, Technische Universität Darmstadt, Germany;Department of Computer Science, Technische Universität Darmstadt, Germany, Max Planck Institute for Intelligent Systems, Tübingen, Germany
Venue:
Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Year:
2013

Citing 14
Cited 0

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Robot Learning From Demonstration

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Algorithms for Inverse Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A Framework for Behavioural Cloning

Machine Intelligence 15, Intelligent Agents [St. Catherine's College, Oxford, July 1995]
The correspondence problem

Imitation in animals and artifacts
Reinforcement Learning in Continuous Time and Space

Neural Computation
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)

Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)
A survey of robot learning from demonstration

Robotics and Autonomous Systems
Biologically-inspired dynamical systems for movement generation: automatic real-time goal adaptation and obstacle avoidance

ICRA'09 Proceedings of the 2009 IEEE international conference on Robotics and Automation
Kinesthetic bootstrapping: teaching motor skills to humanoid robots through physical interaction

KI'09 Proceedings of the 32nd annual German conference on Advances in artificial intelligence
PEGASUS: a policy search method for large MDPs and POMDPs

UAI'00 Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence
On Learning, Representing, and Generalizing a Task in a Humanoid Robot

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Learning to select and generalize striking movements in robot table tennis

International Journal of Robotics Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Efficient skill acquisition is crucial for creating versatile robots. One intuitive way to teach a robot new tricks is to demonstrate a task and enable the robot to imitate the demonstrated behavior. This approach is known as imitation learning. Classical methods of imitation learning, such as inverse reinforcement learning or behavioral cloning, suffer substantially from the correspondence problem when the actions (i.e. motor commands, torques or forces) of the teacher are not observed or the body of the teacher differs substantially, e.g., in the actuation. To address these drawbacks we propose to learn a robot-specific controller that directly matches robot trajectories with observed ones. We present a novel and robust probabilistic model-based approach for solving a probabilistic trajectory matching problem via policy search. For this purpose, we propose to learn a probabilistic model of the system, which we exploit for mental rehearsal of the current controller by making predictions about future trajectories. These internal simulations allow for learning a controller without permanently interacting with the real system, which results in a reduced overall interaction time. Using long-term predictions from this learned model, we train robot-specific controllers that reproduce the expert's distribution of demonstrations without the need to observe motor commands during the demonstration. The strength of our approach is that it addresses the correspondence problem in a principled way. Our method achieves a higher learning speed than both model-based imitation learning based on dynamics motor primitives and trial-and-error-based learning systems with hand-crafted cost functions. We successfully applied our approach to imitating human behavior using a tendon-driven compliant robotic arm. Moreover, we demonstrate the generalization ability of our approach in a multi-task learning setup.