Combining manual feedback with subsequent MDP reward signals for reinforcement learning

Authors:
W. Bradley Knox;Peter Stone
Affiliations:
University of Texas at Austin;University of Texas at Austin
Venue:
Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Year:
2010

Citing 13
Cited 18

Robot shaping: developing autonomous agents through learning

Artificial Intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Integrated learning for interactive synthetic characters

Proceedings of the 29th annual conference on Computer graphics and interactive techniques
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Cobot in LambdaMOO: An Adaptive Social Statistics Agent

Autonomous Agents and Multi-Agent Systems
Probabilistic policy reuse in a reinforcement learning agent

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Cross-domain transfer for reinforcement learning

Proceedings of the 24th international conference on Machine learning
Transfer Learning via Inter-Task Mappings for Temporal Difference Learning

The Journal of Machine Learning Research
A survey of robot learning from demonstration

Robotics and Autonomous Systems
Reinforcement learning with human teachers: evidence of feedback and guidance with implications for learning performance

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Interactively shaping agents via human reinforcement: the TAMER framework

Proceedings of the fifth international conference on Knowledge capture
Accelerating reinforcement learning through implicit imitation

Journal of Artificial Intelligence Research
RL-Glue: Language-Independent Software for Reinforcement-Learning Experiments

The Journal of Machine Learning Research

Training a Tetris agent via interactive shaping: a demonstration of the TAMER framework

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Dynamic reward shaping: training a robot by voice

IBERAMIA'10 Proceedings of the 12th Ibero-American conference on Advances in artificial intelligence
Robot self-initiative and personalization by learning through repeated interactions

Proceedings of the 6th international conference on Human-robot interaction
Human-assisted neuroevolution through shaping, advice and examples

Proceedings of the 13th annual conference on Genetic and evolutionary computation
Towards understanding how humans teach robots

UMAP'11 Proceedings of the 19th international conference on User modeling, adaption, and personalization
Integrating reinforcement learning with human demonstrations of varying ability

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs

Artificial Intelligence
Reinforcement learning from simultaneous human and MDP reward

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Automatic task decomposition and state abstraction from demonstration

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Strategy-Based learning through communication with humans

KES-AMSTA'12 Proceedings of the 6th KES international conference on Agent and Multi-Agent Systems: technologies and applications
A sociologically inspired heuristic for optimization algorithms: A case study on ant systems

Expert Systems with Applications: An International Journal
Human-robot cross-training: computational formulation, modeling and evaluation of a human team training strategy

Proceedings of the 8th ACM/IEEE international conference on Human-robot interaction
Teaching agents with human feedback: a demonstration of the TAMER framework

Proceedings of the companion publication of the 2013 international conference on Intelligent user interfaces companion
Using informative behavior to increase engagement in the tamer framework

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Shared control of a robot using EEG-based feedback signals

Proceedings of the 2nd Workshop on Machine Learning for Interactive Systems: Bridging the Gap Between Perception, Action and Communication
Learning via human feedback in continuous state and action spaces

Applied Intelligence
Intelligent Cooperative Control Architecture: A Framework for Performance Improvement Using Safe Learning

Journal of Intelligent and Robotic Systems
A comparison between a communication-based and a data mining-based learning approach for agents

Intelligent Decision Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

As learning agents move from research labs to the real world, it is increasingly important that human users, including those without programming skills, be able to teach agents desired behaviors. Recently, the tamer framework was introduced for designing agents that can be interactively shaped by human trainers who give only positive and negative feedback signals. Past work on tamer showed that shaping can greatly reduce the sample complexity required to learn a good policy, can enable lay users to teach agents the behaviors they desire, and can allow agents to learn within a Markov Decision Process (MDP) in the absence of a coded reward function. However, tamer does not allow this human training to be combined with autonomous learning based on such a coded reward function. This paper leverages the fast learning exhibited within the tamer framework to hasten a reinforcement learning (RL) algorithm's climb up the learning curve, effectively demonstrating that human reinforcement and MDP reward can be used in conjunction with one another by an autonomous agent. We tested eight plausible tamer+rl methods for combining a previously learned human reinforcement function, H, with MDP reward in a reinforcement learning algorithm. This paper identifies which of these methods are most effective and analyzes their strengths and weaknesses. Results from these tamer+rl algorithms indicate better final performance and better cumulative performance than either a tamer agent or an RL agent alone.