Random sampling technique for overfitting control in genetic programming

Authors:
Ivo Gonçalves;Sara Silva;Joana B. Melo;João M. B. Carreiras
Affiliations:
ECOS/CISUC, DEI/FCTUC, University of Coimbra, Portugal;INESC-ID Lisboa, IST, Technical University of Lisbon, Portugal and ECOS/CISUC, DEI/FCTUC, University of Coimbra, Portugal;GeoDES, Tropical Research Institute (IICT), Lisbon, Portugal;GeoDES, Tropical Research Institute (IICT), Lisbon, Portugal
Venue:
EuroGP'12 Proceedings of the 15th European conference on Genetic Programming
Year:
2012

Citing 26
Cited 4

Genetic programming: on the programming of computers by means of natural selection

Genetic programming: on the programming of computers by means of natural selection
An Evaluation of EvolutionaryGeneralisation in Genetic Programming

Artificial Intelligence Review
Dynamic Training Subset Selection for Supervised Learning in Genetic Programming

PPSN III Proceedings of the International Conference on Evolutionary Computation. The Third Conference on Parallel Problem Solving from Nature: Parallel Problem Solving from Nature
The Effect of Extensive Use of the Mutation Operator on Generalization in Genetic Programming Using Sparse Data Sets

PPSN IV Proceedings of the 4th International Conference on Parallel Problem Solving from Nature
Lexicographic Parsimony Pressure

GECCO '02 Proceedings of the Genetic and Evolutionary Computation Conference
Backwarding: An Overfitting Control for Genetic Programming in a Remote Sensing Application

Selected Papers from the 5th European Conference on Artificial Evolution
Preventing overfitting in GP with canary functions

GECCO '05 Proceedings of the 7th annual conference on Genetic and evolutionary computation
Relaxed genetic programming

Proceedings of the 8th annual conference on Genetic and evolutionary computation
Overfitting avoidance in genetic programming of polynomials

CEC '02 Proceedings of the Evolutionary Computation on 2002. CEC '02. Proceedings of the 2002 Congress - Volume 02
Multi-optimization improves genetic programming generalization ability

Proceedings of the 9th annual conference on Genetic and evolutionary computation
Genetic programming for computational pharmacokinetics in drug discovery and development

Genetic Programming and Evolvable Machines
Balancing accuracy and parsimony in genetic programming

Evolutionary Computation
Dynamic limits for bloat control in genetic programming and a review of past and current bloat theories

Genetic Programming and Evolvable Machines
Using crossover based similarity measure to improve genetic programming generalization ability

Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Order of nonlinearity as a complexity measure for models generated by symbolic regression via Pareto genetic programming

IEEE Transactions on Evolutionary Computation
Using Operator Equalisation for Prediction of Drug Toxicity with Genetic Programming

EPIA '09 Proceedings of the 14th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence
Overfitting or poor learning: a critique of current financial applications of GP

EuroGP'03 Proceedings of the 6th European conference on Genetic programming
A Field Guide to Genetic Programming

A Field Guide to Genetic Programming
Measuring bloat, overfitting and functional complexity in genetic programming

Proceedings of the 12th annual conference on Genetic and evolutionary computation
Open issues in genetic programming

Genetic Programming and Evolvable Machines
Reducing overfitting in genetic programming models for software quality classification

HASE'04 Proceedings of the Eighth IEEE international conference on High assurance systems engineering
Reducing overfitting in manufacturing process modeling using a backward elimination based genetic programming

Applied Soft Computing
An empirical study of functional complexity as an indicator of overfitting in genetic programming

EuroGP'11 Proceedings of the 14th European conference on Genetic programming
Genetic programming, validation sets, and parsimony pressure

EuroGP'06 Proceedings of the 9th European conference on Genetic Programming
Tarpeian bloat control and generalization accuracy

EuroGP'05 Proceedings of the 8th European conference on Genetic Programming
Improving the generalisation ability of genetic programming with semantic similarity based crossover

EuroGP'10 Proceedings of the 13th European conference on Genetic Programming

Balancing learning and overfitting in genetic programming with interleaved sampling of training data

EuroGP'13 Proceedings of the 16th European conference on Genetic Programming
Prediction of forest aboveground biomass: an exercise on avoiding overfitting

EvoApplications'13 Proceedings of the 16th European conference on Applications of Evolutionary Computation
A bootstrapping approach to reduce over-fitting in genetic programming

Proceedings of the 15th annual conference companion on Genetic and evolutionary computation
Effects of constant optimization by nonlinear least squares minimization in symbolic regression

Proceedings of the 15th annual conference companion on Genetic and evolutionary computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the areas of Genetic Programming (GP) that, in comparison to other Machine Learning methods, has seen fewer research efforts is that of generalization. Generalization is the ability of a solution to perform well on unseen cases. It is one of the most important goals of any Machine Learning method, although in GP only recently has this issue started to receive more attention. In this work we perform a comparative analysis of a particularly interesting configuration of the Random Sampling Technique (RST) against the Standard GP approach. Experiments are conducted on three multidimensional symbolic regression real world datasets, the first two on the pharmacokinetics domain and the third one on the forestry domain. The results show that the RST decreases overfitting on all datasets. This technique also improves testing fitness on two of the three datasets. Furthermore, it does so while producing considerably smaller and less complex solutions. We discuss the possible reasons for the good performance of the RST, as well as its possible limitations.