Apprenticeship learning and reinforcement learning with application to robotic control

Authors:
Pieter Abbeel
Affiliations:
Stanford University
Venue:
Apprenticeship learning and reinforcement learning with application to robotic control
Year:
2008

Citing 0
Cited 6

Improving management of Anemia in end stage renal disease using reinforcement learning

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Analysis of Inverse Reinforcement Learning with Perturbed Demonstrations

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Learning from demonstration using MDP induced metrics

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Trajectory generation and control for precise aggressive maneuvers with quadrotors

International Journal of Robotics Research
Opportunities and challenges with autonomous micro aerial vehicles

International Journal of Robotics Research
Integrating generic sensor fusion algorithms with sound state representations through encapsulation of manifolds

Information Fusion

Quantified Score

Hi-index	0.01

Visualization

Abstract

Many problems in robotics have unknown, stochastic, high-dimensional, and highly nonlinear dynamics, and offer significant challenges to both traditional control methods and reinforcement learning algorithms. Some of the key difficulties that arise in these problems are: (i) It is often difficult to write down, in closed form, a formal specification of the control task. For example, what is the objective function for "flying well"? (ii) It is often difficult to build a good dynamics model because of both data collection and data modeling challenges (similar to the "exploration problem" in reinforcement learning). (iii) It is often computationally expensive to find closed-loop controllers for high dimensional, stochastic domains. We describe learning algorithms with formal performance guarantees which show that these problems can be efficiently addressed in the apprenticeship learning setting—the setting when expert demonstrations of the task are available. Our algorithms are guaranteed to return a control policy with performance comparable to the expert's. We evaluate performance on the same task and in the same (typically stochastic, high-dimensional and non-linear) environment as the expert. Besides having theoretical guarantees, our algorithms have also enabled us to solve some previously unsolved real-world control problems: They have enabled a quadruped robot to traverse challenging, previously unseen terrain. They have significantly extended the state-of-the-art in autonomous helicopter flight. Our helicopter has performed by far the most challenging aerobatic maneuvers performed by any autonomous helicopter to date, including maneuvers such as continuous in-place flips, rolls and tic-tocs, which only exceptional expert human pilots can fly. Our aerobatic flight performance is comparable to that of the best human pilots.