Generalisation of the limiting distribution of program sizes in tree-based genetic programming and analysis of its effects on bloat

  • Authors:
  • Stephen Dignum;Riccardo Poli

  • Affiliations:
  • University of Essex;University of Essex

  • Venue:
  • Proceedings of the 9th annual conference on Genetic and evolutionary computation
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent research [1] has found that standard sub-tree crossover with uniform selection of crossover points, in the absence of fitness pressure, pushes a population of GP trees towards a Lagrange distribution of tree sizes. However, the result applied to the case of single arity function plus leaf node combinations, e.g., unary, binary, ternary, etc trees only. In this paper we extend those findings and show that the same distribution is also applicable to the more general case where the function set includes functions of mixed arities. We also provide empirical evidence that strongly corroborates this generalisation. Both predicted and observed results show a distinct bias towards the sampling of shorter programs irrespective of the mix of function arities used. Practical applications and implications of this knowledge are investigated with regard to search efficiency and program bloat. Work is also presented regarding the applicability of the theory to the traditional 90%-function 10%-terminal crossover node selection policy.