Dynamic Multi-Armed Bandits and Extreme Value-Based Rewards for Adaptive Operator Selection in Evolutionary Algorithms

Authors:
Álvaro Fialho;Luis Costa;Marc Schoenauer;Michèle Sebag
Affiliations:
Microsoft Research --- INRIA Joint Centre, Orsay, France;TAO team, INRIA Saclay --- Île-de-France & LRI (UMR CNRS 8623), Orsay, France;Microsoft Research --- INRIA Joint Centre, Orsay, France and TAO team, INRIA Saclay --- Île-de-France & LRI (UMR CNRS 8623), Orsay, France;Microsoft Research --- INRIA Joint Centre, Orsay, France and TAO team, INRIA Saclay --- Île-de-France & LRI (UMR CNRS 8623), Orsay, France
Venue:
Learning and Intelligent Optimization
Year:
2009

Citing 0
Cited 10

Analysis of adaptive operator selection techniques on the royal road and long k-path problems

Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Extreme: dynamic multi-armed bandits for adaptive operator selection

Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers
On benchmark properties for adaptive operator selection

Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers
Extreme compass and dynamic multi-armed bandits for adaptive operator selection

CEC'09 Proceedings of the Eleventh conference on Congress on Evolutionary Computation
Toward comparison-based adaptive operator selection

Proceedings of the 12th annual conference on Genetic and evolutionary computation
Analyzing bandit-based adaptive operator selection mechanisms

Annals of Mathematics and Artificial Intelligence
Operator self-adaptation in genetic programming

EuroGP'11 Proceedings of the 14th European conference on Genetic programming
Evolutionary operator self-adaptation with diverse operators

EuroGP'12 Proceedings of the 15th European conference on Genetic Programming
Pilot, rollout and monte carlo tree search methods for job shop scheduling

LION'12 Proceedings of the 6th international conference on Learning and Intelligent Optimization
An adaptive evolutionary approach for real-time vehicle routing and dispatching

Computers and Operations Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

The performance of many efficient algorithms critically depends on the tuning of their parameters, which on turn depends on the problem at hand. For example, the performance of Evolutionary Algorithms critically depends on the judicious setting of the operator rates. The Adaptive Operator Selection (AOS) heuristic that is proposed here rewards each operator based on the extreme value of the fitness improvement lately incurred by this operator, and uses a Multi-Armed Bandit (MAB) selection process based on those rewards to choose which operator to apply next. This Extreme-based Multi-Armed Bandit approach is experimentally validated against the Average-based MAB method, and is shown to outperform previously published methods, whether using a classical Average-based rewarding technique or the same Extreme-based mechanism. The validation test suite includes the easy One-Max problem and a family of hard problems known as "Long k-paths".