Using Operator Equalisation for Prediction of Drug Toxicity with Genetic Programming

Authors:
Leonardo Vanneschi;Sara Silva
Affiliations:
Dipartimento di Informatica, Sistemistica e Comunicazione (D.I.S.Co.), University of Milano-Bicocca, Milan, Italy;CISUC, Department of Informatics Engineering, University of Coimbra, Polo II, Coimbra, Portugal
Venue:
EPIA '09 Proceedings of the 14th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence
Year:
2009

Citing 16
Cited 7

An Empirical Study of Multipopulation Genetic Programming

Genetic Programming and Evolvable Machines
Lexicographic Parsimony Pressure

GECCO '02 Proceedings of the Genetic and Evolutionary Computation Conference
Modification point depth and genome growth in genetic programming

Evolutionary Computation
Generalisation of the limiting distribution of program sizes in tree-based genetic programming and analysis of its effects on bloat

Proceedings of the 9th annual conference on Genetic and evolutionary computation
Genetic programming for computational pharmacokinetics in drug discovery and development

Genetic Programming and Evolvable Machines
The impact of population size on code growth in GP: analysis and empirical validation

Proceedings of the 10th annual conference on Genetic and evolutionary computation
Dynamic limits for bloat control in genetic programming and a review of past and current bloat theories

Genetic Programming and Evolvable Machines
Extending Operator Equalisation: Fitness Based Self Adaptive Length Distribution for Bloat Free GP

EuroGP '09 Proceedings of the 12th European Conference on Genetic Programming
Operator equalisation, bloat and overfitting: a study on human oral bioavailability prediction

Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Dynamic maximum tree depth: a simple technique for avoiding bloat in tree-based GP

GECCO'03 Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartII
Genetic programming and other machine learning approaches to predict median oral Lethal Dose (LD50) and plasma protein binding levels (%PPB) of drugs

EvoBIO'07 Proceedings of the 5th European conference on Evolutionary computation, machine learning and data mining in bioinformatics
Fitness distance correlation in structural mutation genetic programming

EuroGP'03 Proceedings of the 6th European conference on Genetic programming
On the limiting distribution of program sizes in tree-based genetic programming

EuroGP'07 Proceedings of the 10th European conference on Genetic programming
Operator equalisation and bloat free GP

EuroGP'08 Proceedings of the 11th European conference on Genetic programming
Crossover, sampling, bloat and the harmful effects of size limits

EuroGP'08 Proceedings of the 11th European conference on Genetic programming
A Field Guide to Genetic Programming

A Field Guide to Genetic Programming

Reassembling operator equalisation: a secret revealed

Proceedings of the 13th annual conference on Genetic and evolutionary computation
A quantitative study of learning and generalization in genetic programming

EuroGP'11 Proceedings of the 14th European conference on Genetic programming
Reassembling operator equalisation: a secret revealed

ACM SIGEVOlution
Bloat free genetic programming versus classification trees for identification of burned areas in satellite imagery

EvoApplicatons'10 Proceedings of the 2010 international conference on Applications of Evolutionary Computation - Volume Part I
Random sampling technique for overfitting control in genetic programming

EuroGP'12 Proceedings of the 15th European conference on Genetic Programming
Operator equalisation for bloat free genetic programming and a survey of bloat control methods

Genetic Programming and Evolvable Machines
Balancing learning and overfitting in genetic programming with interleaved sampling of training data

EuroGP'13 Proceedings of the 16th European conference on Genetic Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Predicting the toxicity of new potential drugs is a fundamental step in the drug design process. Recent contributions have shown that, even though Genetic Programming is a promising method for this task, the problem of predicting the toxicity of molecular compounds is complex and difficult to solve. In particular, when executed for predicting drug toxicity, Genetic Programming undergoes the well-known phenomenon of bloat, i.e. the growth in code size during the evolutionary process without a corresponding improvement in fitness. We hypothesize that this might cause overfitting and thus prevent the method from discovering simpler and potentially more general solutions. For this reason, in this paper we investigate two recently defined variants of the operator equalization bloat control method for Genetic Programming. We show that these two methods are bloat free also when executed on this complex problem. Nevertheless, overfitting still remains an issue. Thus, contradicting the generalized idea that bloat and overfitting are strongly related, we argue that the two phenomena are independent from each other and that eliminating bloat does not necessarily eliminate overfitting.