Reinforcement learning from simultaneous human and MDP reward

Authors:
W. Bradley Knox;Peter Stone
Affiliations:
The University of Texas at Austin;The University of Texas at Austin
Venue:
Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Year:
2012

Citing 13
Cited 6

Robot shaping: developing autonomous agents through learning

Artificial Intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Practical Reinforcement Learning in Continuous Spaces

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Cobot in LambdaMOO: An Adaptive Social Statistics Agent

Autonomous Agents and Multi-Agent Systems
Probabilistic policy reuse in a reinforcement learning agent

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Reinforcement learning with human teachers: evidence of feedback and guidance with implications for learning performance

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Interactively shaping agents via human reinforcement: the TAMER framework

Proceedings of the fifth international conference on Knowledge capture
Accelerating reinforcement learning through implicit imitation

Journal of Artificial Intelligence Research
RL-Glue: Language-Independent Software for Reinforcement-Learning Experiments

The Journal of Machine Learning Research
Combining manual feedback with subsequent MDP reward signals for reinforcement learning

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Dynamic reward shaping: training a robot by voice

IBERAMIA'10 Proceedings of the 12th Ibero-American conference on Advances in artificial intelligence
Integrating reinforcement learning with human demonstrations of varying ability

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Automatic state abstraction from demonstration

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two

Human-robot cross-training: computational formulation, modeling and evaluation of a human team training strategy

Proceedings of the 8th ACM/IEEE international conference on Human-robot interaction
Teaching agents with human feedback: a demonstration of the TAMER framework

Proceedings of the companion publication of the 2013 international conference on Intelligent user interfaces companion
Using informative behavior to increase engagement in the tamer framework

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Shared control of a robot using EEG-based feedback signals

Proceedings of the 2nd Workshop on Machine Learning for Interactive Systems: Bridging the Gap Between Perception, Action and Communication
Automatic interface optimization through random exploration of available elements

Proceedings of the 2nd Workshop on Machine Learning for Interactive Systems: Bridging the Gap Between Perception, Action and Communication
Learning via human feedback in continuous state and action spaces

Applied Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

As computational agents are increasingly used beyond research labs, their success will depend on their ability to learn new skills and adapt to their dynamic, complex environments. If human users---without programming skills---can transfer their task knowledge to agents, learning can accelerate dramatically, reducing costly trials. The tamer framework guides the design of agents whose behavior can be shaped through signals of approval and disapproval, a natural form of human feedback. More recently, tamer+rl was introduced to enable human feedback to augment a traditional reinforcement learning (RL) agent that learns from a Markov decision process's (MDP) reward signal. We address limitations of prior work on tamer and tamer+rl, contributing in two critical directions. First, the four successful techniques for combining human reward with RL from prior tamer+rl work are tested on a second task, and these techniques' sensitivities to parameter changes are analyzed. Together, these examinations yield more general and prescriptive conclusions to guide others who wish to incorporate human knowledge into an RL algorithm. Second, tamer+rl has thus far been limited to a sequential setting, in which training occurs before learning from MDP reward. In this paper, we introduce a novel algorithm that shares the same spirit as tamer+rl but learns simultaneously from both reward sources, enabling the human feedback to come at any time during the reinforcement learning process. We call this algorithm simultaneous tamer+rl. To enable simultaneous learning, we introduce a new technique that appropriately determines the magnitude of the human model's influence on the RL algorithm throughout time and state-action space.