Learning from demonstration using MDP induced metrics

Authors:
Francisco S. Melo;Manuel Lopes
Affiliations:
INESC-ID, Instituto Superior Técnico, Porto Salvo, Portugal;University of Plymouth, Plymouth, Devon, UK
Venue:
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Year:
2010

Citing 18
Cited 1

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Algorithms for Inverse Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Equivalence notions and model minimization in Markov decision processes

Artificial Intelligence - special issue on planning with uncertainty and incomplete information
Apprenticeship learning via inverse reinforcement learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Metrics for finite Markov decision processes

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Relating reinforcement learning performance to classification performance

ICML '05 Proceedings of the 22nd international conference on Machine learning
Teaching robots by moulding behavior and scaffolding the environment

Proceedings of the 1st ACM SIGCHI/SIGART conference on Human-robot interaction
Maximum margin planning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Efficient training of artificial neural networks for autonomous navigation

Neural Computation
Apprenticeship learning using linear programming

Proceedings of the 25th international conference on Machine learning
A survey of robot learning from demonstration

Robotics and Autonomous Systems
Apprenticeship learning and reinforcement learning with application to robotic control

Apprenticeship learning and reinforcement learning with application to robotic control
Learning grasping affordances from local visual descriptors

DEVLRN '09 Proceedings of the 2009 IEEE 8th International Conference on Development and Learning
Maximum entropy inverse reinforcement learning

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Approximate policy iteration with a policy language bias: solving relational Markov decision processes

Journal of Artificial Intelligence Research
Interactive policy learning through confidence-based autonomy

Journal of Artificial Intelligence Research
Bayesian inverse reinforcement learning

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Training parsers by inverse reinforcement learning

Machine Learning

Active learning of visual descriptors for grasping using non-parametric smoothed beta distributions

Robotics and Autonomous Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we address the problem of learning a policy from demonstration. Assuming that the policy to be learned is the optimal policy for an underlying MDP, we propose a novel way of leveraging the underlying MDP structure in a kernel-based approach. Our proposed approach rests on the insight that the MDP structure can be encapsulated into an adequate state-space metric. In particular we show that, using MDP metrics, we are able to cast the problem of learning from demonstration as a classification problem and attain similar generalization performance as methods based on inverse reinforcement learning at a much lower online computational cost. Our method is also able to attain superior generalization than other supervised learning methods that fail to consider the MDP structure.