Crossover, sampling, bloat and the harmful effects of size limits

  • Authors:
  • Stephen Dignum;Riccardo Poli

  • Affiliations:
  • Department of Computing and Electronic Systems, University of Essex, Colchester, UK;Department of Computing and Electronic Systems, University of Essex, Colchester, UK

  • Venue:
  • EuroGP'08 Proceedings of the 11th European conference on Genetic programming
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent research [9,2] has enabled the accurate prediction of the limiting distribution of tree sizes for Genetic Programming with standard sub-tree swapping crossover when GP is applied to a flat fitness landscape. In that work, however, tree sizes are measured in terms of number of internal nodes. While the relationship between internal nodes and length is one-to-one for the case of a-ary trees, it is much more complex in the case of mixed arities. So, practically the length bias of subtree crossover remains unknown. This paper starts to fill this theoretical gap, by providing accurate estimates of the limiting distribution of lengths approached by tree-based GP with standard crossover in the absence of selection pressure. The resulting models confirm that short programs can be expected to be heavily resampled. Empirical validation shows that this is indeed the case. We also study empirically how the situation is modified by the application of program length limits. Surprisingly, the introduction of such limits further exacerbates the effect. However, this has more profound consequences than one might imagine at first. We analyse these consequences and predict that, in the presence of fitness, size limits may initially speed up bloat, almost completely defeating their original purpose (combating bloat). Indeed, experiments confirm that this is the case for the first 10 or 15 generations. This leads us to suggest a better way of using size limits. Finally, this paper proposes a novel technique to counteract bloat, sampling parsimony, the application of a penalty to resampling.