Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning

Authors:
Shimon Whiteson;Matthew E. Taylor;Peter Stone
Affiliations:
Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands 1098 XG;Computer Sciences Department, The University of Southern California, Los Angeles, USA 90089-0781;Department of Computer Sciences, The University of Texas at Austin, Austin, USA 78712-0233
Venue:
Autonomous Agents and Multi-Agent Systems
Year:
2010

Citing 50
Cited 3

Radial basis functions for multivariable interpolation: a review

Algorithms for approximation
Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

Proceedings of the seventh international conference (1990) on Machine learning
Technical Note: \cal Q-Learning

Machine Learning
Genetic Reinforcement Learning for Neurocontrol Problems

Machine Learning - Special issue on genetic algorithms
Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time

Machine Learning
TD-Gammon, a self-teaching backgammon program, achieves master-level play

Neural Computation
Efficient reinforcement learning through symbiotic evolution

Machine Learning - Special issue on reinforcement learning
Co-Evolution in the Successful Learning of Backgammon Strategy

Machine Learning
Comments on “Co-Evolution in the Successful Learning of Backgammon Strategy”

Machine Learning
Elevator Group Control Using Multiple Reinforcement Learning Agents

Machine Learning
Gradient descent for general reinforcement learning

Proceedings of the 1998 conference on Advances in neural information processing systems II
Layered Learning in Multiagent Systems: A Winning Approach to Robotic Soccer

Layered Learning in Multiagent Systems: A Winning Approach to Robotic Soccer
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Brains, Behavior and Robotics

Brains, Behavior and Robotics
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Near-Optimal Reinforcement Learning in Polynomial Time

Machine Learning
Evolving Neural Control Systems

IEEE Expert: Intelligent Systems and Their Applications
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Evolving neural networks through augmenting topologies

Evolutionary Computation
Practical Reinforcement Learning in Continuous Spaces

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Averaging Efficiently in the Presence of Noise

PPSN V Proceedings of the 5th International Conference on Parallel Problem Solving from Nature
Dynamic Programming

Dynamic Programming
Learning and problem-solving with multilayer connectionist systems (adaptive, strategy learning, neural networks, reinforcement learning)

Learning and problem-solving with multilayer connectionist systems (adaptive, strategy learning, neural networks, reinforcement learning)
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
Least-squares policy iteration

The Journal of Machine Learning Research
Evolving Soccer Keepaway Players Through Task Decomposition

Machine Learning
Co-evolving recurrent neurons learn deep memory POMDPs

GECCO '05 Proceedings of the 7th annual conference on Genetic and evolutionary computation
A theoretical analysis of Model-Based Interval Estimation

ICML '05 Proceedings of the 22nd international conference on Machine learning
Cooperative Coevolution: An Architecture for Evolving Coadapted Subcomponents

Evolutionary Computation
Automatic basis function construction for approximate dynamic programming and reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Comparing evolutionary and temporal difference methods in a reinforcement learning domain

Proceedings of the 8th annual conference on Genetic and evolutionary computation
Learning tetris using the noisy cross-entropy method

Neural Computation
Evolutionary Function Approximation for Reinforcement Learning

The Journal of Machine Learning Research
Generating large-scale neural networks through discovering geometric regularities

Proceedings of the 9th annual conference on Genetic and evolutionary computation
Evolving neural networks for fractured domains

Proceedings of the 10th annual conference on Genetic and evolutionary computation
Analysis of an evolutionary reinforcement learning method in a multiagent domain

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1
A Hybrid Reinforcement Learning Approach to Autonomic Resource Allocation

ICAC '06 Proceedings of the 2006 IEEE International Conference on Autonomic Computing
2009 Special Issue: Evolving neural networks for strategic decision-making problems

Neural Networks
An empirical analysis of value function-based and policy search reinforcement learning

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
A comparison between cellular encoding and direct encoding for genetic neural networks

GECCO '96 Proceedings of the 1st annual conference on Genetic and evolutionary computation
Samuel meets Amarel: automating value function approximation using global state space analysis

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Competitive coevolution through evolutionary complexification

Journal of Artificial Intelligence Research
Infinite-horizon policy-gradient estimation

Journal of Artificial Intelligence Research
Solving non-Markovian control tasks with neuroevolution

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Model-based exploration in continuous state spaces

SARA'07 Proceedings of the 7th International conference on Abstraction, reformulation, and approximation
On the complexity of solving Markov decision problems

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Efficient non-linear control through neuroevolution

ECML'06 Proceedings of the 17th European conference on Machine Learning
Keepaway soccer: from machine learning testbed to benchmark

RoboCup 2005
Neural fitted q iteration – first experiences with a data efficient neural reinforcement learning method

ECML'05 Proceedings of the 16th European conference on Machine Learning
Coevolution versus self-play temporal difference learning for acquiring position evaluation in small-board go

IEEE Transactions on Evolutionary Computation

Sustaining behavioral diversity in NEAT

Proceedings of the 12th annual conference on Genetic and evolutionary computation
APRIL: active preference learning-based reinforcement learning

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Adaptive reservoir computing through evolution and learning

Neurocomputing

Quantified Score

Hi-index	0.04

Visualization

Abstract

Temporal difference and evolutionary methods are two of the most common approaches to solving reinforcement learning problems. However, there is little consensus on their relative merits and there have been few empirical studies that directly compare their performance. This article aims to address this shortcoming by presenting results of empirical comparisons between Sarsa and NEAT, two representative methods, in mountain car and keepaway, two benchmark reinforcement learning tasks. In each task, the methods are evaluated in combination with both linear and nonlinear representations to determine their best configurations. In addition, this article tests two specific hypotheses about the critical factors contributing to these methods' relative performance: (1) that sensor noise reduces the final performance of Sarsa more than that of NEAT, because Sarsa's learning updates are not reliable in the absence of the Markov property and (2) that stochasticity, by introducing noise in fitness estimates, reduces the learning speed of NEAT more than that of Sarsa. Experiments in variations of mountain car and keepaway designed to isolate these factors confirm both these hypotheses.