Learning from demonstration using MDP induced metrics

  • Authors:
  • Francisco S. Melo;Manuel Lopes

  • Affiliations:
  • INESC-ID, Instituto Superior Técnico, Porto Salvo, Portugal;University of Plymouth, Plymouth, Devon, UK

  • Venue:
  • ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we address the problem of learning a policy from demonstration. Assuming that the policy to be learned is the optimal policy for an underlying MDP, we propose a novel way of leveraging the underlying MDP structure in a kernel-based approach. Our proposed approach rests on the insight that the MDP structure can be encapsulated into an adequate state-space metric. In particular we show that, using MDP metrics, we are able to cast the problem of learning from demonstration as a classification problem and attain similar generalization performance as methods based on inverse reinforcement learning at a much lower online computational cost. Our method is also able to attain superior generalization than other supervised learning methods that fail to consider the MDP structure.