Dynamic reward shaping: training a robot by voice

Authors:
Ana C. Tenorio-Gonzalez;Eduardo F. Morales;Luis Villaseñor-Pineda
Affiliations:
National Institute of Astrophysics, Optics and Electronics, Computer Science Department, Tonantzintla, México;National Institute of Astrophysics, Optics and Electronics, Computer Science Department, Tonantzintla, México;National Institute of Astrophysics, Optics and Electronics, Computer Science Department, Tonantzintla, México
Venue:
IBERAMIA'10 Proceedings of the 12th Ibero-American conference on Advances in artificial intelligence
Year:
2010

Citing 13
Cited 6

Reinforcement learning and its application to control

Reinforcement learning and its application to control
Robot shaping: developing autonomous agents through learning

Artificial Intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Learning to Drive a Bicycle Using Reinforcement Learning and Shaping

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Action Chaining by a Developmental Robot with a Value System

ICDL '02 Proceedings of the 2nd International Conference on Development and Learning
Apprenticeship learning via inverse reinforcement learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Theory and application of reward shaping in reinforcement learning

Theory and application of reward shaping in reinforcement learning
Autonomous shaping: knowledge transfer in reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Interactive robot task training through dialog and demonstration

Proceedings of the ACM/IEEE international conference on Human-robot interaction
Automatic shaping and decomposition of reward functions

Proceedings of the 24th international conference on Machine learning
Reinforcement learning with human teachers: evidence of feedback and guidance with implications for learning performance

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Combining manual feedback with subsequent MDP reward signals for reinforcement learning

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1

Teaching a robot to perform task through imitation and on-line feedback

CIARP'11 Proceedings of the 16th Iberoamerican Congress conference on Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
Reinforcement learning from simultaneous human and MDP reward

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Human-robot cross-training: computational formulation, modeling and evaluation of a human team training strategy

Proceedings of the 8th ACM/IEEE international conference on Human-robot interaction
Learning non-myopically from human-generated reward

Proceedings of the 2013 international conference on Intelligent user interfaces
Teaching agents with human feedback: a demonstration of the TAMER framework

Proceedings of the companion publication of the 2013 international conference on Intelligent user interfaces companion
Using informative behavior to increase engagement in the tamer framework

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reinforcement Learning is commonly used for learning tasks in robotics, however, traditional algorithms can take very long training times. Reward shaping has been recently used to provide domain knowledge with extra rewards to converge faster. The reward shaping functions are normally defined in advance by the user and are static. This paper introduces a dynamic reward shaping approach, in which these extra rewards are not consistently given, can vary with time and may sometimes be contrary to what is needed for achieving a goal. In the experiments, a user provides verbal feedback while a robot is performing a task which is translated into additional rewards. It is shown that we can still guarantee convergence as long as most of the shaping rewards given per state are consistent with the goals and that even with fairly noisy interaction the system can still produce faster convergence times than traditional reinforcement learning techniques.