Batch reinforcement learning in a complex domain

Authors:
Shivaram Kalyanakrishnan;Peter Stone
Affiliations:
The University of Texas at Austin;The University of Texas at Austin
Venue:
Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Year:
2007

Citing 11
Cited 4

Technical Note: \cal Q-Learning

Machine Learning
Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching

Machine Learning
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Artificial Intelligence
Reinforcement Learning

Reinforcement Learning
Brains, Behavior and Robotics

Brains, Behavior and Robotics
Keepaway Soccer: A Machine Learning Testbed

RoboCup 2001: Robot Soccer World Cup V
Least-squares policy iteration

The Journal of Machine Learning Research
Behavior transfer for value-function-based reinforcement learning

Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Tree-Based Batch Mode Reinforcement Learning

The Journal of Machine Learning Research
Comparing evolutionary and temporal difference methods in a reinforcement learning domain

Proceedings of the 8th annual conference on Genetic and evolutionary computation
Neural fitted q iteration – first experiences with a data efficient neural reinforcement learning method

ECML'05 Proceedings of the 16th European conference on Machine Learning

Model-Based Reinforcement Learning in a Complex Domain

RoboCup 2007: Robot Soccer World Cup XI
Adaptive treatment of epilepsy via batch-mode reinforcement learning

IAAI'08 Proceedings of the 20th national conference on Innovative applications of artificial intelligence - Volume 3
Autonomic multi-policy optimization in pervasive systems: Overview and evaluation

ACM Transactions on Autonomous and Adaptive Systems (TAAS) - Special section on formal methods in pervasive computing, pervasive adaptation, and self-adaptive systems: Models and algorithms
Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems

Automatica (Journal of IFAC)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Temporal difference reinforcement learning algorithms are perfectly suited to autonomous agents because they learn directly from an agent's experience based on sequential actions in the environment. However, their most common algorithmic variants are relatively inefficient in their use of experience data, which in many agent-based settings can be scarce. In particular, they make just one learning "update" for each atomic experience. Batch reinforcement learning algorithms, on the other hand, aim to achieve greater data efficiency by saving experience data and using it in aggregate to make updates to the learned policy. Their success has been demonstrated in the past on simple domains like grid worlds and low-dimensional control applications like pole balancing. In this paper, we compare and contrast batch reinforcement learning algorithms with on-line algorithms based on their empirical performance in a complex, continuous, noisy, multiagent domain, namely RoboCup soccer Keepaway. We find that the two batch methods we consider, Experience Replay and Fitted Q Iteration, both yield significant gains in sample complexity, while achieving high asymptotic performance.