Using expectation-maximization for reinforcement learning
Neural Computation
Natural gradient works efficiently in learning
Neural Computation
2010 Special Issue: Parameter-exploring policy gradients
Neural Networks
Bidirectional relation between CMA evolution strategies and natural evolution strategies
PPSN'10 Proceedings of the 11th international conference on Parallel problem solving from nature: Part I
Towards the geometry of estimation of distribution algorithms based on the exponential family
Proceedings of the 11th workshop proceedings on Foundations of genetic algorithms
Ant system: optimization by a colony of cooperating agents
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
PPSN'12 Proceedings of the 12th international conference on Parallel Problem Solving from Nature - Volume Part I
Hi-index | 0.00 |
Information-Geometric Optimization (IGO) is a unified framework of stochastic algorithms for optimization problems. Given a family of probability distributions, IGO turns the original optimization problem into a new maximization problem on the parameter space of the probability distributions. IGO updates the parameter of the probability distribution along the natural gradient, taken with respect to the Fisher metric on the parameter manifold, aiming at maximizing an adaptive transform of the objective function. IGO recovers several known algorithms as particular instances: for the family of Bernoulli distributions IGO recovers PBIL, for the family of Gaussian distributions the pure rank-μ CMA-ES update is recovered, and for exponential families in expectation parametrization the cross-entropy/ML method is recovered. This article provides a theoretical justification for the IGO framework, by proving that any step size not greater than 1 guarantees monotone improvement over the course of optimization, in terms of q-quantile values of the objective function f. The range of admissible step sizes is independent of f and its domain. We extend the result to cover the case of different step sizes for blocks of the parameters in the IGO algorithm. Moreover, we prove that expected fitness improves over time when fitness-proportional selection is applied, in which case the RPP algorithm is recovered.