Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Apprenticeship learning via inverse reinforcement learning
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Autonomous shaping: knowledge transfer in reinforcement learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
Learning by demonstration with critique from a human teacher
Proceedings of the ACM/IEEE international conference on Human-robot interaction
Dynamic reward shaping: training a robot by voice
IBERAMIA'10 Proceedings of the 12th Ibero-American conference on Advances in artificial intelligence
Learning non-myopically from human-generated reward
Proceedings of the 2013 international conference on Intelligent user interfaces
Teaching agents with human feedback: a demonstration of the TAMER framework
Proceedings of the companion publication of the 2013 international conference on Intelligent user interfaces companion
Hi-index | 0.00 |
Service robots are becoming increasingly available and it is expected that they will be part of many human activities in the near future. It is desirable for these robots to adapt themselves to the user's needs, so non-expert users will have to teach them how to perform new tasks in natural ways. In this paper a new teaching by demonstration algorithm is described. It uses a Kinect® sensor to track the movements of a user, eliminating the need of special sensors or environment conditions, it represents the tasks with a relational representation to facilitate the correspondence problem between the user and robot arm and to learn how to perform tasks in a more general description, it uses reinforcement learning to improve over the initial sequences provided by the user, and it incorporates on-line feedback from the user during the learning process creating a novel dynamic reward shaping mechanism to converge faster to an optimal policy. We demonstrate the approach by learning simple manipulation tasks of a robot arm and show its superiority over more traditional reinforcement learning algorithms.