Genetic programming, validation sets, and parsimony pressure

Authors:
Christian Gagné;Marc Schoenauer;Marc Parizeau;Marco Tomassini
Affiliations:
Équipe TAO – INRIA Futurs, LRI Bat. 490, Université Paris Sud, Orsay, France;Équipe TAO – INRIA Futurs, LRI Bat. 490, Université Paris Sud, Orsay, France;Laboratoire de Vision et Systèmes Numériques (LVSN), Département de Génie Électrique et de Génie Informatique, Université Laval, Québec (QC), Canada;Information Systems Institute, Université de Lausanne, Dorigny, Switzerland
Venue:
EuroGP'06 Proceedings of the 9th European conference on Genetic Programming
Year:
2006

Citing 20
Cited 18

Genetic programming: on the programming of computers by means of natural selection

Genetic programming: on the programming of computers by means of natural selection
Genetic programming using a minimum description length principle

Advances in genetic programming
Machine Learning

Machine Learning
The Role of Occam‘s Razor in Knowledge Discovery

Data Mining and Knowledge Discovery
Size Fair and Homologous Tree Crossovers for Tree Genetic Programming

Genetic Programming and Evolvable Machines
Selection Based on the Pareto Nondomination Criterion for Controlling Code Growth in Genetic Programming

Genetic Programming and Evolvable Machines
Evolving Teams of Predictors with Linear Genetic Programming

Genetic Programming and Evolvable Machines
Some Considerations on the Reason for Bloat

Genetic Programming and Evolvable Machines
Complexity Compression and Evolution

Proceedings of the 6th International Conference on Genetic Algorithms
Lexicographic Parsimony Pressure

GECCO '02 Proceedings of the Genetic and Evolutionary Computation Conference
Open BEAGLE: A New C++ Evolutionary Computation Framework

GECCO '02 Proceedings of the Genetic and Evolutionary Computation Conference
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Problem Difficulty and Code Growth in Genetic Programming

Genetic Programming and Evolvable Machines
Balancing accuracy and parsimony in genetic programming

Evolutionary Computation
Effects of code growth and parsimony pressure on populations in genetic programming

Evolutionary Computation
Generality versus size in genetic programming

GECCO '96 Proceedings of the 1st annual conference on Genetic and evolutionary computation
Methods for evolving robust programs

GECCO'03 Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartII
Dynamic maximum tree depth: a simple technique for avoiding bloat in tree-based GP

GECCO'03 Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartII
Generalisation and model selection in supervised learning with evolutionary computation

EvoWorkshops'03 Proceedings of the 2003 international conference on Applications of evolutionary computing
Genetic programming and evolutionary generalization

IEEE Transactions on Evolutionary Computation

Designing a classifier by a layered multi-population genetic programming approach

Pattern Recognition
Evolving robust GP solutions for hedge fund stock selection in emerging markets

Proceedings of the 9th annual conference on Genetic and evolutionary computation
The Generalisation Ability of a Selection Architecture for Genetic Programming

Proceedings of the 10th international conference on Parallel Problem Solving from Nature: PPSN X
Using crossover based similarity measure to improve genetic programming generalization ability

Proceedings of the 11th Annual conference on Genetic and evolutionary computation
A survey and taxonomy of performance improvement of canonical genetic programming

Knowledge and Information Systems
Parsimony doesn't mean simplicity: genetic programming for inductive inference on noisy data

EuroGP'07 Proceedings of the 10th European conference on Genetic programming
Theoretical results in genetic programming: the next ten years?

Genetic Programming and Evolvable Machines
Open issues in genetic programming

Genetic Programming and Evolvable Machines
The role of syntactic and semantic locality of crossover in genetic programming

PPSN'10 Proceedings of the 11th international conference on Parallel problem solving from nature: Part II
How to promote generalisation in evolutionary robotics: the ProGAb approach

Proceedings of the 13th annual conference on Genetic and evolutionary computation
Overfitting detection and adaptive covariant parsimony pressure for symbolic regression

Proceedings of the 13th annual conference companion on Genetic and evolutionary computation
A quantitative study of learning and generalization in genetic programming

EuroGP'11 Proceedings of the 14th European conference on Genetic programming
Validation sets for evolutionary curtailment with improved generalisation

ICHIT'11 Proceedings of the 5th international conference on Convergence and hybrid information technology
Improving the generalisation ability of genetic programming with semantic similarity based crossover

EuroGP'10 Proceedings of the 13th European conference on Genetic Programming
Evolving interpolating models of net ecosystem CO2 exchange using grammatical evolution

EuroGP'12 Proceedings of the 15th European conference on Genetic Programming
Random sampling technique for overfitting control in genetic programming

EuroGP'12 Proceedings of the 15th European conference on Genetic Programming
Where should we stop? an investigation on early stopping for GP learning

SEAL'12 Proceedings of the 9th international conference on Simulated Evolution and Learning
A bootstrapping approach to reduce over-fitting in genetic programming

Proceedings of the 15th annual conference companion on Genetic and evolutionary computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Fitness functions based on test cases are very common in Genetic Programming (GP). This process can be assimilated to a learning task, with the inference of models from a limited number of samples. This paper is an investigation on two methods to improve generalization in GP-based learning: 1) the selection of the best-of-run individuals using a three data sets methodology, and 2) the application of parsimony pressure in order to reduce the complexity of the solutions. Results using GP in a binary classification setup show that while the accuracy on the test sets is preserved, with less variances compared to baseline results, the mean tree size obtained with the tested methods is significantly reduced.