On-line evolutionary computation for reinforcement learning in stochastic domains

Authors:
Shimon Whiteson;Peter Stone
Affiliations:
University of Texas at Austin, Austin, TX;University of Texas at Austin, Austin, TX
Venue:
Proceedings of the 8th annual conference on Genetic and evolutionary computation
Year:
2006

Citing 16
Cited 4

Genetic algorithms with sharing for multimodal function optimization

Proceedings of the Second International Conference on Genetic Algorithms on Genetic algorithms and their application
Learning in embedded systems

Learning in embedded systems
Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence

Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Finite-time Analysis of the Multiarmed Bandit Problem

Machine Learning
Evolving neural networks through augmenting topologies

Evolutionary Computation
The Vision of Autonomic Computing

Computer
Learning Classifier Systems, From Foundations to Applications

Learning Classifier Systems, From Foundations to Applications
Practical Reinforcement Learning in Continuous Spaces

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Averaging Efficiently in the Presence of Noise

PPSN V Proceedings of the 5th International Conference on Parallel Problem Solving from Nature
Utility Functions in Autonomic Systems

ICAC '04 Proceedings of the First International Conference on Autonomic Computing
An autonomous explore/exploit strategy

GECCO '05 Proceedings of the 7th annual workshop on Genetic and evolutionary computation
Evolutionary Function Approximation for Reinforcement Learning

The Journal of Machine Learning Research
A comparison between cellular encoding and direct encoding for genetic neural networks

GECCO '96 Proceedings of the 1st annual conference on Genetic and evolutionary computation
Competitive coevolution through evolutionary complexification

Journal of Artificial Intelligence Research
Bandit problems and the exploration/exploitation tradeoff

IEEE Transactions on Evolutionary Computation

Rational Bidding Using Reinforcement Learning

GECON '08 Proceedings of the 5th international workshop on Grid Economics and Business Models
Q-Strategy: A Bidding Strategy for Market-Based Allocation of Grid Services

OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part I on On the Move to Meaningful Internet Systems:
On-line neuroevolution applied to the open racing car simulator

CEC'09 Proceedings of the Eleventh conference on Congress on Evolutionary Computation
Kernel-based online NEAT for keepaway soccer

LSMS'07 Proceedings of the Life system modeling and simulation 2007 international conference on Bio-Inspired computational intelligence and applications

Quantified Score

Hi-index	0.01

Visualization

Abstract

In reinforcement learning, an agent interacting with its environment strives to learn a policy that specifies, for each state it may encounter, what action to take. Evolutionary computation is one of the most promising approaches to reinforcement learning but its success is largely restricted to off-line scenarios. In on-line scenarios, an agent must strive to maximize the reward it accrues while it is learning. Temporal difference (TD) methods, another approach to reinforcement learning, naturally excel in on-line scenarios because they have selection mechanisms for balancing the need to search for better policies exploration) with the need to accrue maximal reward (exploitation). This paper presents a novel way to strike this balance in evolutionary methods by borrowing the selection mechanisms used by TD methods to choose individual actions and using them in evolution to choose policies for evaluation. Empirical results in the mountain car and server job scheduling domains demonstrate that these techniques can substantially improve evolution's on-line performance in stochastic domains.