Penalty functions for genetic programming algorithms

Authors:
José L. Montaña;César L. Alonso;Cruz Enrique Borges;Javier De La Dehesa
Affiliations:
Departamento de Matemáticas, Estadística y Computación, Universidad de Cantabria, Santander, Spain;Centro de Inteligencia Artificial, Universidad de Oviedo, Gijón, Spain;Departamento de Matemáticas, Estadística y Computación, Universidad de Cantabria, Santander, Spain;Departamento de Matemáticas, Estadística y Computación, Universidad de Cantabria, Santander, Spain
Venue:
ICCSA'11 Proceedings of the 2011 international conference on Computational science and its applications - Volume Part I
Year:
2011

Citing 9
Cited 0

Genetic programming: on the programming of computers by means of natural selection

Genetic programming: on the programming of computers by means of natural selection
Measuring the VC-dimension of a learning machine

Neural Computation
The nature of statistical learning theory

The nature of statistical learning theory
Comparison of model selection for regression

Neural Computation
A statistical learning theory approach of bloat

GECCO '05 Proceedings of the 7th annual conference on Genetic and evolutionary computation
Measuring the VC-Dimension Using Optimized Experimental Design

Neural Computation
Adaptation, Performance and Vapnik-Chervonenkis Dimension of Straight Line Programs

EuroGP '09 Proceedings of the 12th European Conference on Genetic Programming
A Statistical Learning Perspective of Genetic Programming

EuroGP '09 Proceedings of the 12th European Conference on Genetic Programming
VCD Bounds for some GP Genotypes

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Very often symbolic regression, as addressed in Genetic Programming (GP), is equivalent to approximate interpolation. This means that, in general, GP algorithms try to fit the sample as better as possible but no notion of generalization error is considered. As a consequence, overfitting, code-bloat and noisy data are problems which are not satisfactorily solved under this approach. Motivated by this situation we review the problem of Symbolic Regression under the perspective of Machine Learning, a well founded mathematical toolbox for predictive learning. We perform empirical comparisons between classical statistical methods (AIC and BIC) and methods based on Vapnik-Chrevonenkis (VC) theory for regression problems under genetic training. Empirical comparisons of the different methods suggest practical advantages of VC-based model selection. We conclude that VC theory provides methodological framework for complexity control in Genetic Programming even when its technical results seems not be directly applicable. As main practical advantage, precise penalty functions founded on the notion of generalization error are proposed for evolving GP-trees.