A Statistical Learning Perspective of Genetic Programming

Authors:
Nur Merve Amil;Nicolas Bredeche;Christian Gagné;Sylvain Gelly;Marc Schoenauer;Olivier Teytaud
Affiliations:
TAO, INRIA Saclay, LRI, Bat. 490, Université Paris-Sud, 91405 Orsay CEDEX, France (*) LVSN, GEL-GIF, Univ. Laval, Qubec, Canada F1V0A6;TAO, INRIA Saclay, LRI, Bat. 490, Université Paris-Sud, 91405 Orsay CEDEX, France (*) LVSN, GEL-GIF, Univ. Laval, Qubec, Canada F1V0A6;TAO, INRIA Saclay, LRI, Bat. 490, Université Paris-Sud, 91405 Orsay CEDEX, France (*) LVSN, GEL-GIF, Univ. Laval, Qubec, Canada F1V0A6;TAO, INRIA Saclay, LRI, Bat. 490, Université Paris-Sud, 91405 Orsay CEDEX, France (*) LVSN, GEL-GIF, Univ. Laval, Qubec, Canada F1V0A6;TAO, INRIA Saclay, LRI, Bat. 490, Université Paris-Sud, 91405 Orsay CEDEX, France (*) LVSN, GEL-GIF, Univ. Laval, Qubec, Canada F1V0A6;TAO, INRIA Saclay, LRI, Bat. 490, Université Paris-Sud, 91405 Orsay CEDEX, France (*) LVSN, GEL-GIF, Univ. Laval, Qubec, Canada F1V0A6
Venue:
EuroGP '09 Proceedings of the 12th European Conference on Genetic Programming
Year:
2009

Citing 15
Cited 3

Genetic programming: on the programming of computers by means of natural selection

Genetic programming: on the programming of computers by means of natural selection
The nature of statistical learning theory

The nature of statistical learning theory
Genetic programming: an introduction: on the automatic evolution of computer programs and its applications

Genetic programming: an introduction: on the automatic evolution of computer programs and its applications
The evolution of size and shape

Advances in genetic programming
Size Fair and Homologous Tree Crossovers for Tree Genetic Programming

Genetic Programming and Evolvable Machines
What Makes a Problem GP-Hard? Analysis of a Tunably Difficult Problem in Genetic Programming

Genetic Programming and Evolvable Machines
Accurate Replication in Genetic Programming

Proceedings of the 6th International Conference on Genetic Algorithms
Complexity Compression and Evolution

Proceedings of the 6th International Conference on Genetic Algorithms
Lexicographic Parsimony Pressure

GECCO '02 Proceedings of the Genetic and Evolutionary Computation Conference
Maintaining the Diversity of Genetic Programs

EuroGP '02 Proceedings of the 5th European Conference on Genetic Programming
Exons and Code Growth in Genetic Programming

EuroGP '02 Proceedings of the 5th European Conference on Genetic Programming
Problem Difficulty and Code Growth in Genetic Programming

Genetic Programming and Evolvable Machines
Balancing accuracy and parsimony in genetic programming

Evolutionary Computation
Effects of code growth and parsimony pressure on populations in genetic programming

Evolutionary Computation
Dynamic maximum tree depth: a simple technique for avoiding bloat in tree-based GP

GECCO'03 Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartII

A quantitative study of learning and generalization in genetic programming

EuroGP'11 Proceedings of the 14th European conference on Genetic programming
Penalty functions for genetic programming algorithms

ICCSA'11 Proceedings of the 2011 international conference on Computational science and its applications - Volume Part I
Evolutionary computation for supervised learning

Proceedings of the 15th annual conference companion on Genetic and evolutionary computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a theoretical analysis of Genetic Programming (GP) from the perspective of statistical learning theory, a well grounded mathematical toolbox for machine learning. By computing the Vapnik-Chervonenkis dimension of the family of programs that can be inferred by a specific setting of GP, it is proved that a parsimonious fitness ensures universal consistency. This means that the empirical error minimization allows convergence to the best possible error when the number of test cases goes to infinity. However, it is also proved that the standard method consisting in putting a hard limit on the program size still results in programs of infinitely increasing size in function of their accuracy. It is also shown that cross-validation or hold-out for choosing the complexity level that optimizes the error rate in generalization also leads to bloat. So a more complicated modification of the fitness is proposed in order to avoid unnecessary bloat while nevertheless preserving universal consistency.