APRIL: active preference learning-based reinforcement learning

Authors:
Riad Akrour;Marc Schoenauer;Michèle Sebag
Affiliations:
TAO, CNRS − INRIA − LRI, Université Paris-Sud, Orsay Cedex, France;TAO, CNRS − INRIA − LRI, Université Paris-Sud, Orsay Cedex, France;TAO, CNRS − INRIA − LRI, Université Paris-Sud, Orsay Cedex, France
Venue:
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Year:
2012

Citing 21
Cited 1

Support-Vector Networks

Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Efficient Global Optimization of Expensive Black-Box Functions

Journal of Global Optimization
Algorithms for Inverse Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Bayes point machines

The Journal of Machine Learning Research
Least-squares policy iteration

The Journal of Machine Learning Research
Apprenticeship learning via inverse reinforcement learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Large Margin Methods for Structured and Interdependent Output Variables

The Journal of Machine Learning Research
A support vector method for multivariate performance measures

ICML '05 Proceedings of the 22nd international conference on Machine learning
Completely Derandomized Self-Adaptation in Evolution Strategies

Evolutionary Computation
Training linear SVMs in linear time

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
2008 Special Issue: Reinforcement learning of motor skills with policy gradients

Neural Networks
Multiple instance ranking

Proceedings of the 25th international conference on Machine learning
Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning

Autonomous Agents and Multi-Agent Systems
Algorithms for Reinforcement Learning

Algorithms for Reinforcement Learning
Feature selection for reinforcement learning: evaluating implicit state-reward dependency via conditional mutual information

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Preference-based policy learning

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Preference-based policy iteration: leveraging preference learning for reinforcement learning

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
On Learning, Representing, and Generalizing a Task in a Humanoid Robot

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Monte carlo methods for preference learning

LION'12 Proceedings of the 6th international conference on Learning and Intelligent Optimization

Machine learning for interactive systems and robots: a brief introduction

Proceedings of the 2nd Workshop on Machine Learning for Interactive Systems: Bridging the Gap Between Perception, Action and Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper focuses on reinforcement learning (RL) with limited prior knowledge. In the domain of swarm robotics for instance, the expert can hardly design a reward function or demonstrate the target behavior, forbidding the use of both standard RL and inverse reinforcement learning. Although with a limited expertise, the human expert is still often able to emit preferences and rank the agent demonstrations. Earlier work has presented an iterative preference-based RL framework: expert preferences are exploited to learn an approximate policy return, thus enabling the agent to achieve direct policy search. Iteratively, the agent selects a new candidate policy and demonstrates it; the expert ranks the new demonstration comparatively to the previous best one; the expert's ranking feedback enables the agent to refine the approximate policy return, and the process is iterated. In this paper, preference-based reinforcement learning is combined with active ranking in order to decrease the number of ranking queries to the expert needed to yield a satisfactory policy. Experiments on the mountain car and the cancer treatment testbeds witness that a couple of dozen rankings enable to learn a competent policy.