Solving the multiple instance problem with axis-parallel rectangles
Artificial Intelligence
Numerical Optimization of Computer Models
Numerical Optimization of Computer Models
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Algorithms for Inverse Reinforcement Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A Framework for Behavioural Cloning
Machine Intelligence 15, Intelligent Agents [St. Catherine's College, Oxford, July 1995]
Apprenticeship learning via inverse reinforcement learning
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Convergence results for the (1, λ)-SA-ES using the theory of ϕ-irreducible Markov chains
Theoretical Computer Science
A support vector method for multivariate performance measures
ICML '05 Proceedings of the 22nd international conference on Machine learning
PAC model-free reinforcement learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
Training linear SVMs in linear time
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Robotic Grasping of Novel Objects using Vision
International Journal of Robotics Research
The Forgetron: A Kernel-Based Perceptron on a Budget
SIAM Journal on Computing
Modeling and Optimization of Adaptive Foraging in Swarm Robotic Systems
International Journal of Robotics Research
On Learning, Representing, and Generalizing a Task in a Humanoid Robot
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
APRIL: active preference learning-based reinforcement learning
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Monte carlo methods for preference learning
LION'12 Proceedings of the 6th international conference on Learning and Intelligent Optimization
Combining fitness-based search and user modeling in evolutionary robotics
Proceedings of the 15th annual conference on Genetic and evolutionary computation
Hi-index | 0.00 |
Many machine learning approaches in robotics, based on reinforcement learning, inverse optimal control or direct policy learning, critically rely on robot simulators. This paper investigates a simulatorfree direct policy learning, called Preference-based Policy Learning (PPL). PPL iterates a four-step process: the robot demonstrates a candidate policy; the expert ranks this policy comparatively to other ones according to her preferences; these preferences are used to learn a policy return estimate; the robot uses the policy return estimate to build new candidate policies, and the process is iterated until the desired behavior is obtained. PPL requires a good representation of the policy search space be available, enabling one to learn accurate policy return estimates and limiting the human ranking effort needed to yield a good policy. Furthermore, this representation cannot use informed features (e.g., how far the robot is from any target) due to the simulator-free setting. As a second contribution, this paper proposes a representation based on the agnostic exploitation of the robotic log. The convergence of PPL is analytically studied and its experimental validation on two problems, involving a single robot in a maze and two interacting robots, is presented.