Learning via human feedback in continuous state and action spaces

Authors:
Ngo Anh Vien;Wolfgang Ertel;Tae Choong Chung
Affiliations:
Institute of Artificial Intelligence, Ravensburg-Weingarten University of Applied Sciences, Weingarten, Germany 88250;Institute of Artificial Intelligence, Ravensburg-Weingarten University of Applied Sciences, Weingarten, Germany 88250;Department of Computer Engineering, Kyung Hee University, Seoul, South Korea
Venue:
Applied Intelligence
Year:
2013

Citing 26
Cited 2

Practical Issues in Temporal Difference Learning

Machine Learning
TD-Gammon, a self-teaching backgammon program, achieves master-level play

Neural Computation
Temporal difference learning and TD-Gammon

Communications of the ACM
Experiments with reinforcement learning in problems with continuous state and action spaces

Adaptive Behavior
Learning to Play Chess Using Temporal Differences

Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
On Actor-Critic Algorithms

SIAM Journal on Control and Optimization
Distributed Reinforcement Learning Control for Batch Sequencing and Sizing in Just-In-Time Manufacturing Systems

Applied Intelligence
Apprenticeship learning via inverse reinforcement learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Preface: Innovations in agent collaboration, cooperation and Teaming, Part 2

Journal of Network and Computer Applications
Learning teaching strategies in an Adaptive and Intelligent Educational System through Reinforcement Learning

Applied Intelligence
Reinforcement learning with human teachers: evidence of feedback and guidance with implications for learning performance

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Interactively shaping agents via human reinforcement: the TAMER framework

Proceedings of the fifth international conference on Knowledge capture
Natural actor-critic algorithms

Automatica (Journal of IFAC)
A reinforcement learning approach to job-shop scheduling

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Some studies in machine learning using the game of checkers

IBM Journal of Research and Development
Combining manual feedback with subsequent MDP reward signals for reinforcement learning

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Training a Tetris agent via interactive shaping: a demonstration of the TAMER framework

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Combining active learning and reactive control for robot grasping

Robotics and Autonomous Systems
Hessian matrix distribution for Bayesian policy gradient reinforcement learning

Information Sciences: an International Journal
Microassembly path planning using reinforcement learning for improving positioning accuracy of a 1 cm3 omni-directional mobile microrobot

Applied Intelligence
Policy search for motor primitives in robotics

Machine Learning
Function approximation via tile coding: automating parameter choice

SARA'05 Proceedings of the 6th international conference on Abstraction, Reformulation and Approximation
Reinforcement learning from simultaneous human and MDP reward

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Accelerated Bayesian learning for decentralized two-armed bandit based decision making with applications to the Goore Game

Applied Intelligence
Multi-criteria expertness based cooperative Q-learning

Applied Intelligence

Hierarchical control of traffic signals using Q-learning with tile coding

Applied Intelligence
Point-based online value iteration algorithm in large POMDP

Applied Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper considers the problem of extending Training an Agent Manually via Evaluative Reinforcement (TAMER) in continuous state and action spaces. Investigative research using the TAMER framework enables a non-technical human to train an agent through a natural form of human feedback (negative or positive). The advantages of TAMER have been shown on tasks of training agents by only human feedback or combining human feedback with environment rewards. However, these methods are originally designed for discrete state-action, or continuous state-discrete action problems. This paper proposes an extension of TAMER to allow both continuous states and actions, called ACTAMER. The new framework utilizes any general function approximation of a human trainer's feedback signal. Moreover, a combined capability of ACTAMER and reinforcement learning is also investigated and evaluated. The combination of human feedback and reinforcement learning is studied in both settings: sequential and simultaneous. Our experimental results demonstrate the proposed method successfully allowing a human to train an agent in two continuous state-action domains: Mountain Car and Cart-pole (balancing).