Using crossover based similarity measure to improve genetic programming generalization ability

Authors:
Leonardo Vanneschi;Steven Gustafson
Affiliations:
University of Milano-Bicocca, Milan, Italy;GE Global Research, Niskayuna, NY, USA
Venue:
Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Year:
2009

Citing 19
Cited 5

Genetic programming (videotape): the movie

Genetic programming (videotape): the movie
The Role of Occam‘s Razor in Knowledge Discovery

Data Mining and Knowledge Discovery
An Evaluation of EvolutionaryGeneralisation in Genetic Programming

Artificial Intelligence Review
Fitness Distance Correlation as a Measure of Problem Difficulty for Genetic Algorithms

Proceedings of the 6th International Conference on Genetic Algorithms
The Effect of Extensive Use of the Mutation Operator on Generalization in Genetic Programming Using Sparse Data Sets

PPSN IV Proceedings of the 4th International Conference on Parallel Problem Solving from Nature
Maintaining the Diversity of Genetic Programs

EuroGP '02 Proceedings of the 5th European Conference on Genetic Programming
General schema theory for genetic programming with subtree-swapping crossover: part I

Evolutionary Computation
Problem Difficulty and Code Growth in Genetic Programming

Genetic Programming and Evolvable Machines
A Study of Fitness Distance Correlation as a Difficulty Measure in Genetic Programming

Evolutionary Computation
Genetic programming for human oral bioavailability of drugs

Proceedings of the 8th annual conference on Genetic and evolutionary computation
Relaxed genetic programming

Proceedings of the 8th annual conference on Genetic and evolutionary computation
Genetic programming for computational pharmacokinetics in drug discovery and development

Genetic Programming and Evolvable Machines
Benchmarking the generalization capabilities of a compiling genetic programming system using sparse data sets

GECCO '96 Proceedings of the 1st annual conference on Genetic and evolutionary computation
Fitness distance correlation in structural mutation genetic programming

EuroGP'03 Proceedings of the 6th European conference on Genetic programming
Genetic programming, validation sets, and parsimony pressure

EuroGP'06 Proceedings of the 9th European conference on Genetic Programming
Using subtree crossover distance to investigate genetic programming dynamics

EuroGP'06 Proceedings of the 9th European conference on Genetic Programming
Operator-Based distance for genetic programming: subtree crossover distance

EuroGP'05 Proceedings of the 8th European conference on Genetic Programming
Diversity in genetic programming: an analysis of measures and correlation with fitness

IEEE Transactions on Evolutionary Computation
Crossover-Based Tree Distance in Genetic Programming

IEEE Transactions on Evolutionary Computation

Open issues in genetic programming

Genetic Programming and Evolvable Machines
The role of syntactic and semantic locality of crossover in genetic programming

PPSN'10 Proceedings of the 11th international conference on Parallel problem solving from nature: Part II
Improving the generalisation ability of genetic programming with semantic similarity based crossover

EuroGP'10 Proceedings of the 13th European conference on Genetic Programming
Random sampling technique for overfitting control in genetic programming

EuroGP'12 Proceedings of the 15th European conference on Genetic Programming
Where should we stop? an investigation on early stopping for GP learning

SEAL'12 Proceedings of the 9th international conference on Simulated Evolution and Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Generalization is a very important issue in Machine Learning. In this paper, we present a new idea for improving Genetic Programming generalization ability. The idea is based on a dynamic two-layered selection algorithm and it is tested on a real-life drug discovery regression application. The algorithm begins using root mean squared error as fitness and the usual tournament selection. A list of individuals called ``repulsors'' is also kept in memory and initialized as empty. As an individual is found to overfit the training set, it is inserted into the list of repulsors. When the list of repulsors is not empty, selection becomes a two-layer algorithm: individuals participating to the tournament are not randomly chosen from the population but are themselves selected, using the average dissimilarity to the repulsors as a criterion to be maximized. Two kinds of similarity/dissimilarity measures are tested for this aim: the well known structural (or edit) distance and the recently defined subtree crossover based similarity measure. Although simple, this idea seems to improve Genetic Programming generalization ability and the presented experimental results show that Genetic Programming generalizes better when subtree crossover based similarity measure is used, at least for the test problems studied in this paper.