Random sampling technique for overfitting control in genetic programming

  • Authors:
  • Ivo Gonçalves;Sara Silva;Joana B. Melo;João M. B. Carreiras

  • Affiliations:
  • ECOS/CISUC, DEI/FCTUC, University of Coimbra, Portugal;INESC-ID Lisboa, IST, Technical University of Lisbon, Portugal and ECOS/CISUC, DEI/FCTUC, University of Coimbra, Portugal;GeoDES, Tropical Research Institute (IICT), Lisbon, Portugal;GeoDES, Tropical Research Institute (IICT), Lisbon, Portugal

  • Venue:
  • EuroGP'12 Proceedings of the 15th European conference on Genetic Programming
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

One of the areas of Genetic Programming (GP) that, in comparison to other Machine Learning methods, has seen fewer research efforts is that of generalization. Generalization is the ability of a solution to perform well on unseen cases. It is one of the most important goals of any Machine Learning method, although in GP only recently has this issue started to receive more attention. In this work we perform a comparative analysis of a particularly interesting configuration of the Random Sampling Technique (RST) against the Standard GP approach. Experiments are conducted on three multidimensional symbolic regression real world datasets, the first two on the pharmacokinetics domain and the third one on the forestry domain. The results show that the RST decreases overfitting on all datasets. This technique also improves testing fitness on two of the three datasets. Furthermore, it does so while producing considerably smaller and less complex solutions. We discuss the possible reasons for the good performance of the RST, as well as its possible limitations.