Preference-based policy learning

Authors:
Riad Akrour;Marc Schoenauer;Michele Sebag
Affiliations:
TAO, CNRS, INRIA, Université Paris-Sud;TAO, CNRS, INRIA, Université Paris-Sud;TAO, CNRS, INRIA, Université Paris-Sud
Venue:
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Year:
2011

Citing 15
Cited 3

Solving the multiple instance problem with axis-parallel rectangles

Artificial Intelligence
Numerical Optimization of Computer Models

Numerical Optimization of Computer Models
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Algorithms for Inverse Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A Framework for Behavioural Cloning

Machine Intelligence 15, Intelligent Agents [St. Catherine's College, Oxford, July 1995]
Apprenticeship learning via inverse reinforcement learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Convergence results for the (1, λ)-SA-ES using the theory of ϕ-irreducible Markov chains

Theoretical Computer Science
A support vector method for multivariate performance measures

ICML '05 Proceedings of the 22nd international conference on Machine learning
PAC model-free reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Training linear SVMs in linear time

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Robotic Grasping of Novel Objects using Vision

International Journal of Robotics Research
The Forgetron: A Kernel-Based Perceptron on a Budget

SIAM Journal on Computing
2008 Special Issue: Reinforcement learning of motor skills with policy gradients

Neural Networks
Modeling and Optimization of Adaptive Foraging in Swarm Robotic Systems

International Journal of Robotics Research
On Learning, Representing, and Generalizing a Task in a Humanoid Robot

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

APRIL: active preference learning-based reinforcement learning

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Monte carlo methods for preference learning

LION'12 Proceedings of the 6th international conference on Learning and Intelligent Optimization
Combining fitness-based search and user modeling in evolutionary robotics

Proceedings of the 15th annual conference on Genetic and evolutionary computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many machine learning approaches in robotics, based on reinforcement learning, inverse optimal control or direct policy learning, critically rely on robot simulators. This paper investigates a simulatorfree direct policy learning, called Preference-based Policy Learning (PPL). PPL iterates a four-step process: the robot demonstrates a candidate policy; the expert ranks this policy comparatively to other ones according to her preferences; these preferences are used to learn a policy return estimate; the robot uses the policy return estimate to build new candidate policies, and the process is iterated until the desired behavior is obtained. PPL requires a good representation of the policy search space be available, enabling one to learn accurate policy return estimates and limiting the human ranking effort needed to yield a good policy. Furthermore, this representation cannot use informed features (e.g., how far the robot is from any target) due to the simulator-free setting. As a second contribution, this paper proposes a representation based on the agnostic exploitation of the robotic log. The convergence of PPL is analytically studied and its experimental validation on two problems, involving a single robot in a maze and two interacting robots, is presented.