Measuring bloat, overfitting and functional complexity in genetic programming

  • Authors:
  • Leonardo Vanneschi;Mauro Castelli;Sara Silva

  • Affiliations:
  • Univ. of Milano-Bicocca, Milan, Italy & INESC-ID Lisboa,Lisbon, Portugal;Univ. of Milano-Bicocca, Milan, Italy;INESC-ID Lisboa, Lisbon, Portugal & University of Coimbra, Portugal

  • Venue:
  • Proceedings of the 12th annual conference on Genetic and evolutionary computation
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent contributions clearly show that eliminating bloat in a genetic programming system does not necessarily eliminate overfitting and vice-versa. This fact seems to contradict a common agreement of many researchers known as the minimum description length principle, which states that the best model is the one that minimizes the amount of information needed to encode it. Another common agreement is that overfitting should be, in some sense, related to the functional complexity of the model. The goal of this paper is to define three measures to respectively quantify bloat, overfitting and functional complexity of solutions and show their suitability on a set of test problems including a simple bidimensional symbolic regression test function and two real-life multidimensional regression problems. The experimental results are encouraging and should pave the way to further investigation. Advantages and drawbacks of the proposed measures are discussed, and ways to improve them are suggested. In the future, these measures should be useful to study and better understand the relationship between bloat, overfitting and functional complexity of solutions.