A bootstrapping approach to reduce over-fitting in genetic programming

  • Authors:
  • Jeannie Fitzgerald;R. Muhammad Atif Azad;Conor Ryan

  • Affiliations:
  • University of Limerick, Limerick, Ireland;University of Limerick, Limerick, Ireland;University of Limerick, Limerick, Ireland

  • Venue:
  • Proceedings of the 15th annual conference companion on Genetic and evolutionary computation
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Historically, the quality of a solution in Genetic Programming (GP) was often assessed based on its performance on a given training sample. However, in Machine Learning, we are more interested in achieving reliable estimates of the quality of the evolving individuals on unseen data. In this paper, we propose to simulate the effect of unseen data during training without actually using any additional data. We do this by employing a technique called bootstrapping that repeatedly re-samples with replacement from the training data and helps estimate sensitivity of the individual in question to small variations across these re-sampled data sets. We minimise this sensitivity, as measured by the Bootstrap Standard Error, together with the training error, in an effort to evolve models that generalise better to the unseen data. We evaluate the proposed technique on four binary classification problems and compare with a standard GP approach. The results show that for the problems undertaken, the proposed method not only generalises significantly better than standard GP while the training performance improves, but also demonstrates a strong side effect of containing the tree sizes.