Apprenticeship learning with few examples

Authors:
Abdeslam Boularias;Brahim Chaib-Draa
Affiliations:
Max Planck Institute for Intelligent Systems, Spemannstraíe 41, 72076 Tuebingen, Germany;Computer Science and Software Engineering Department, Laval University, Quebec, Canada G1V 0A6
Venue:
Neurocomputing
Year:
2013

Citing 14
Cited 0

Algorithms for Inverse Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
An algebraic approach to abstraction in reinforcement learning

An algebraic approach to abstraction in reinforcement learning
Exploration and apprenticeship learning in reinforcement learning

ICML '05 Proceedings of the 22nd international conference on Machine learning
Maximum margin planning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Natural Actor-Critic

Neurocomputing
Apprenticeship learning using linear programming

Proceedings of the 25th international conference on Machine learning
Mirror neuron framework yields representations for robot interaction

Neurocomputing
Transfer via soft homomorphisms

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Learning to search: Functional gradient techniques for imitation learning

Autonomous Robots
Bayesian inverse reinforcement learning

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Transfer Learning for Reinforcement Learning Domains: A Survey

The Journal of Machine Learning Research
Autonomous Helicopter Aerobatics through Apprenticeship Learning

International Journal of Robotics Research
Computational modeling of cortical pathways involved in action execution and action observation

Neurocomputing
Modeling purposeful adaptive behavior with the principle of maximum causal entropy

Modeling purposeful adaptive behavior with the principle of maximum causal entropy

Quantified Score

Hi-index	0.01

Visualization

Abstract

We consider the problem of imitation learning when the examples, provided by an expert human, are scarce. Apprenticeship learning via inverse reinforcement learning provides an efficient tool for generalizing the examples, based on the assumption that the expert's policy maximizes a value function, which is a linear combination of state and action features. Most apprenticeship learning algorithms use only simple empirical averages of the features in the demonstrations as a statistics of the expert's policy. However, this method is efficient only when the number of examples is sufficiently large to cover most of the states, or the dynamics of the system is nearly deterministic. In this paper, we show that the quality of the learned policies is sensitive to the error in estimating the averages of the features when the dynamics of the system is stochastic. To reduce this error, we introduce two new approaches for bootstrapping the demonstrations by assuming that the expert is near-optimal and the dynamics of the system is known. In the first approach, the expert's examples are used to learn a reward function and to generate furthermore examples from the corresponding optimal policy. The second approach uses a transfer technique, known as graph homomorphism, in order to generalize the expert's actions to unvisited regions of the state space. Empirical results on simulated robot navigation problems show that our approach is able to learn sufficiently good policies from a significantly small number of examples.