Balancing learning and overfitting in genetic programming with interleaved sampling of training data

Authors:
Ivo Gonçalves;Sara Silva
Affiliations:
CISUC, Department of Informatics Engineering, University of Coimbra, Portugal;INESC-ID Lisboa, IST, Technical University of Lisbon, Portugal, CISUC, Department of Informatics Engineering, University of Coimbra, Portugal
Venue:
EuroGP'13 Proceedings of the 16th European conference on Genetic Programming
Year:
2013

Citing 12
Cited 0

Genetic programming: on the programming of computers by means of natural selection

Genetic programming: on the programming of computers by means of natural selection
An Evaluation of EvolutionaryGeneralisation in Genetic Programming

Artificial Intelligence Review
Dynamic Training Subset Selection for Supervised Learning in Genetic Programming

PPSN III Proceedings of the International Conference on Evolutionary Computation. The Third Conference on Parallel Problem Solving from Nature: Parallel Problem Solving from Nature
Genetic programming for computational pharmacokinetics in drug discovery and development

Genetic Programming and Evolvable Machines
Dynamic limits for bloat control in genetic programming and a review of past and current bloat theories

Genetic Programming and Evolvable Machines
Using Operator Equalisation for Prediction of Drug Toxicity with Genetic Programming

EPIA '09 Proceedings of the 14th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence
A Field Guide to Genetic Programming

A Field Guide to Genetic Programming
Human-competitive results produced by genetic programming

Genetic Programming and Evolvable Machines
Open issues in genetic programming

Genetic Programming and Evolvable Machines
Reducing overfitting in genetic programming models for software quality classification

HASE'04 Proceedings of the Eighth IEEE international conference on High assurance systems engineering
Random sampling technique for overfitting control in genetic programming

EuroGP'12 Proceedings of the 15th European conference on Genetic Programming
Bloat free genetic programming: application to human oral bioavailability prediction

International Journal of Data Mining and Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Generalization is the ability of a model to perform well on cases not seen during the training phase. In Genetic Programming generalization has recently been recognized as an important open issue, and increased efforts are being made towards evolving models that do not overfit. In this work we expand on recent developments that showed that using a small and frequently changing subset of the training data is effective in reducing overfitting and improving generalization. Particularly, we build upon the idea of randomly choosing a single training instance at each generation and balance it with periodically using all training data. The motivation for this approach is based on trying to keep overfitting low (represented by using a single training instance) and still presenting enough information so that a general pattern can be found (represented by using all training data). We propose two approaches called interleaved sampling and random interleaved sampling that respectively represent doing this balancing in a deterministic or a probabilistic way. Experiments are conducted on three high-dimensional real-life datasets on the pharmacokinetics domain. Results show that most of the variants of the proposed approaches are able to consistently improve generalization and reduce overfitting when compared to standard Genetic Programming. The best variants are even able of such improvements on a dataset where a recent and representative state-of-the-art method could not. Furthermore, the resulting models are short and hence easier to interpret, an important achievement from the applications' point of view.