A bootstrapping approach to reduce over-fitting in genetic programming

Authors:
Jeannie Fitzgerald;R. Muhammad Atif Azad;Conor Ryan
Affiliations:
University of Limerick, Limerick, Ireland;University of Limerick, Limerick, Ireland;University of Limerick, Limerick, Ireland
Venue:
Proceedings of the 15th annual conference companion on Genetic and evolutionary computation
Year:
2013

Citing 24
Cited 0

Bagging predictors

Machine Learning
Open BEAGLE: A New C++ Evolutionary Computation Framework

GECCO '02 Proceedings of the Genetic and Evolutionary Computation Conference
Genetic Programming, Ensemble Methods and the Bias/Variance Tradeoff - Introductory Investigations

Proceedings of the European Conference on Genetic Programming
Genetic programming: a paradigm for genetically breeding populations of computer programs to solve problems

Genetic programming: a paradigm for genetically breeding populations of computer programs to solve problems
A comparison of bloat control methods for genetic programming

Evolutionary Computation
Dynamic limits for bloat control in genetic programming and a review of past and current bloat theories

Genetic Programming and Evolvable Machines
On Improving Generalisation in Genetic Programming

EuroGP '09 Proceedings of the 12th European Conference on Genetic Programming
Genetic programming, the reflection of chaos, and the bootstrap: towards a useful test for chaos

GECCO '96 Proceedings of the 1st annual conference on Genetic and evolutionary computation
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Bloat control operators and diversity in genetic programming: A comparative study

Evolutionary Computation
GP classification under imbalanced data sets: active sub-sampling and AUC approximation

EuroGP'08 Proceedings of the 11th European conference on Genetic programming
A Field Guide to Genetic Programming

A Field Guide to Genetic Programming
Measuring bloat, overfitting and functional complexity in genetic programming

Proceedings of the 12th annual conference on Genetic and evolutionary computation
Abstract functions and lifetime learning in genetic programming for symbolic regression

Proceedings of the 12th annual conference on Genetic and evolutionary computation
Promoting phenotypic diversity in genetic programming

PPSN'10 Proceedings of the 11th international conference on Parallel problem solving from nature: Part II
Variance based selection to improve test set performance in genetic programming

Proceedings of the 13th annual conference on Genetic and evolutionary computation
A quantitative study of learning and generalization in genetic programming

EuroGP'11 Proceedings of the 14th European conference on Genetic programming
Validation sets for evolutionary curtailment with improved generalisation

ICHIT'11 Proceedings of the 5th international conference on Convergence and hybrid information technology
Genetic programming, validation sets, and parsimony pressure

EuroGP'06 Proceedings of the 9th European conference on Genetic Programming
Improving the generalisation ability of genetic programming with semantic similarity based crossover

EuroGP'10 Proceedings of the 13th European conference on Genetic Programming
Genetic programming and evolutionary generalization

IEEE Transactions on Evolutionary Computation
Training genetic programming on half a million patterns: an example from anomaly detection

IEEE Transactions on Evolutionary Computation
Random sampling technique for overfitting control in genetic programming

EuroGP'12 Proceedings of the 15th European conference on Genetic Programming
Exploring boundaries: optimising individual class boundaries for binary classification problem

Proceedings of the 14th annual conference on Genetic and evolutionary computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Historically, the quality of a solution in Genetic Programming (GP) was often assessed based on its performance on a given training sample. However, in Machine Learning, we are more interested in achieving reliable estimates of the quality of the evolving individuals on unseen data. In this paper, we propose to simulate the effect of unseen data during training without actually using any additional data. We do this by employing a technique called bootstrapping that repeatedly re-samples with replacement from the training data and helps estimate sensitivity of the individual in question to small variations across these re-sampled data sets. We minimise this sensitivity, as measured by the Bootstrap Standard Error, together with the training error, in an effort to evolve models that generalise better to the unseen data. We evaluate the proposed technique on four binary classification problems and compare with a standard GP approach. The results show that for the problems undertaken, the proposed method not only generalises significantly better than standard GP while the training performance improves, but also demonstrates a strong side effect of containing the tree sizes.