Exploration and apprenticeship learning in reinforcement learning

Authors:
Pieter Abbeel;Andrew Y. Ng
Affiliations:
Stanford University Stanford, CA;Stanford University Stanford, CA
Venue:
ICML '05 Proceedings of the 22nd international conference on Machine learning
Year:
2005

Citing 11
Cited 22

Optimal control: linear quadratic methods

Optimal control: linear quadratic methods
ALVINN: an autonomous land vehicle in a neural network

Advances in neural information processing systems 1
Learning to fly

ML92 Proceedings of the ninth international workshop on Machine learning
Reinforcement Learning

Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Near-Optimal Reinforcement Learning in Polynomial Time

Machine Learning
Practical Reinforcement Learning in Continuous Spaces

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Efficient Reinforcement Learning in Factored MDPs

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Learning Movement Sequences from Demonstration

ICDL '02 Proceedings of the 2nd International Conference on Development and Learning
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
Apprenticeship learning via inverse reinforcement learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning

Qualitative reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Active reinforcement learning

Proceedings of the 25th international conference on Machine learning
Autonomous agent learning using an actor-critic algorithm and behavior models

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3
Proposal of Exploitation-Oriented Learning PS-r#

IDEAL '08 Proceedings of the 9th International Conference on Intelligent Data Engineering and Automated Learning
Probabilistic Inference for Fast Learning in Control

Recent Advances in Reinforcement Learning
A survey of robot learning from demonstration

Robotics and Autonomous Systems
Learning Actions through Imitation and Exploration: Towards Humanoid Robots That Learn from Humans

Creating Brain-Like Intelligence
Neuroevolutionary reinforcement learning for generalized helicopter control

Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Real-time reinforcement learning by sequential Actor-Critics and experience replay

Neural Networks
Transfer Learning for Reinforcement Learning Domains: A Survey

The Journal of Machine Learning Research
Provably Efficient Learning with Typed Parametric Models

The Journal of Machine Learning Research
Autonomous Helicopter Aerobatics through Apprenticeship Learning

International Journal of Robotics Research
Teacher feedback to scaffold and refine demonstrated motion primitives on a mobile robot

Robotics and Autonomous Systems
Policy adaptation with tactile feedback

Proceedings of the 6th international conference on Human-robot interaction
Reinforcement learning and apprenticeship learning for robotic control

ALT'06 Proceedings of the 17th international conference on Algorithmic Learning Theory
Improvement of systems management policies using hybrid reinforcement learning

ECML'06 Proceedings of the 17th European conference on Machine Learning
Tactile Guidance for Policy Adaptation

Foundations and Trends in Robotics
Evaluation of the improved penalty avoiding rational policy making algorithm in real world environment

ACIIDS'12 Proceedings of the 4th Asian conference on Intelligent Information and Database Systems - Volume Part I
Apprenticeship learning with few examples

Neurocomputing
Undesired state-action prediction in multi-agent reinforcement learning for linked multi-component robotic system control

Information Sciences: an International Journal
2013 Special Issue: Autonomous reinforcement learning with experience replay

Neural Networks
Intelligent Cooperative Control Architecture: A Framework for Performance Improvement Using Safe Learning

Journal of Intelligent and Robotic Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider reinforcement learning in systems with unknown dynamics. Algorithms such as E3 (Kearns and Singh, 2002) learn near-optimal policies by using "exploration policies" to drive the system towards poorly modeled states, so as to encourage exploration. But this makes these algorithms impractical for many systems; for example, on an autonomous helicopter, overly aggressive exploration may well result in a crash. In this paper, we consider the apprenticeship learning setting in which a teacher demonstration of the task is available. We show that, given the initial demonstration, no explicit exploration is necessary, and we can attain near-optimal performance (compared to the teacher) simply by repeatedly executing "exploitation policies" that try to maximize rewards. In finite-state MDPs, our algorithm scales polynomially in the number of states; in continuous-state linear dynamical systems, it scales polynomially in the dimension of the state. These results are proved using a martingale construction over relative losses.