Repeated patterns in genetic programming

  • Authors:
  • W. B. Langdon;W. Banzhaf

  • Affiliations:
  • Department of Computer Science, Essex Institute of Technology, University of Essex, Colchester, UK CO4 35Q;Department of Computer Science, Memorial University of Newfoundland, St. John's, Canada A1B 3X5

  • Venue:
  • Natural Computing: an international journal
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Evolved genetic programming trees contain many repeated code fragments. Size fair crossover limits bloat in automatic programming, preventing the evolution of recurring motifs. We examine these complex properties in detail using depth vs. size Catalan binary tree shape plots, subgraph and subtree matching, information entropy, sensitivity analysis, syntactic and semantic fitness correlations. Programs evolve in a self-similar fashion, akin to fractal random trees, with diffuse introns. Data mining frequent patterns reveals that as software is progressively improved a large proportion of it is exactly repeated subtrees as well as exactly repeated subgraphs. We relate this emergent phenomenon to building blocks in GP and suggest GP works by jumbling subtrees which already have high fitness on the whole problem to give incremental improvements and create complete solutions with multiple identical components of different importance.