Penalty functions for genetic programming algorithms

  • Authors:
  • José L. Montaña;César L. Alonso;Cruz Enrique Borges;Javier De La Dehesa

  • Affiliations:
  • Departamento de Matemáticas, Estadística y Computación, Universidad de Cantabria, Santander, Spain;Centro de Inteligencia Artificial, Universidad de Oviedo, Gijón, Spain;Departamento de Matemáticas, Estadística y Computación, Universidad de Cantabria, Santander, Spain;Departamento de Matemáticas, Estadística y Computación, Universidad de Cantabria, Santander, Spain

  • Venue:
  • ICCSA'11 Proceedings of the 2011 international conference on Computational science and its applications - Volume Part I
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Very often symbolic regression, as addressed in Genetic Programming (GP), is equivalent to approximate interpolation. This means that, in general, GP algorithms try to fit the sample as better as possible but no notion of generalization error is considered. As a consequence, overfitting, code-bloat and noisy data are problems which are not satisfactorily solved under this approach. Motivated by this situation we review the problem of Symbolic Regression under the perspective of Machine Learning, a well founded mathematical toolbox for predictive learning. We perform empirical comparisons between classical statistical methods (AIC and BIC) and methods based on Vapnik-Chrevonenkis (VC) theory for regression problems under genetic training. Empirical comparisons of the different methods suggest practical advantages of VC-based model selection. We conclude that VC theory provides methodological framework for complexity control in Genetic Programming even when its technical results seems not be directly applicable. As main practical advantage, precise penalty functions founded on the notion of generalization error are proposed for evolving GP-trees.