Sample-efficient evolutionary function approximation for reinforcement learning

Authors:
Shimon Whiteson;Peter Stone
Affiliations:
Department of Computer Sciences, University of Texas at Austin, Austin, TX;Department of Computer Sciences, University of Texas at Austin, Austin, TX
Venue:
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Year:
2006

Citing 15
Cited 1

Learning internal representations by error propagation

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching

Machine Learning
Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time

Machine Learning
TD-Gammon, a self-teaching backgammon program, achieves master-level play

Neural Computation
Elevator Group Control Using Multiple Reinforcement Learning Agents

Machine Learning
Gradient descent for general reinforcement learning

Proceedings of the 1998 conference on Advances in neural information processing systems II
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Evolving neural networks through augmenting topologies

Evolutionary Computation
The Vision of Autonomic Computing

Computer
Least-squares policy iteration

The Journal of Machine Learning Research
Utility Functions in Autonomic Systems

ICAC '04 Proceedings of the First International Conference on Autonomic Computing
Evolutionary Function Approximation for Reinforcement Learning

The Journal of Machine Learning Research
Machine learning for fast quadrupedal locomotion

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Neural fitted q iteration – first experiences with a data efficient neural reinforcement learning method

ECML'05 Proceedings of the 16th European conference on Machine Learning

Empirical Studies in Action Selection with Reinforcement Learning

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reinforcement learning problems are commonly tackled with temporal difference methods, which attempt to estimate the agent's optimal value function. In most real-world problems, learning this value function requires a function approximator, which maps state-action pairs to values via a concise, parameterized function. In practice, the success of function approximators depends on the ability of the human designer to select an appropriate representation for the value function. A recently developed approach called evolutionary function approximation uses evolutionary computation to automate the search for effective representations. While this approach can substantially improve the performance of TD methods, it requires many sample episodes to do so. We present an enhancement to evolutionary function approximation that makes it much more sample-efficient by exploiting the off-policy nature of certain TD methods. Empirical results in a server job scheduling domain demonstrate that the enhanced method can learn better policies than evolution or TD methods alone and can do so in many fewer episodes than standard evolutionary function approximation.