Predicting problem difficulty for genetic programming applied to data classification

Authors:
Leonardo Trujillo;Yuliana Martínez;Edgar Galván-López;Pierrick Legrand
Affiliations:
Instituto Tecnológico de Tijuana, Tijuana, Mexico;Instituto Tecnológico de Tijuana, Tijuana, Mexico;University of Essex, Essex, United Kingdom;Université Victor Segalen Bordeaux 2, Bordeaux, France
Venue:
Proceedings of the 13th annual conference on Genetic and evolutionary computation
Year:
2011

Citing 22
Cited 4

Genetic programming II: automatic discovery of reusable programs

Genetic programming II: automatic discovery of reusable programs
Machine learning, neural and statistical classification

Machine learning, neural and statistical classification
Meta Analysis of Classification Algorithms for Pattern Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Complexity Measures of Supervised Classification Problems

IEEE Transactions on Pattern Analysis and Machine Intelligence
Lexicographic Parsimony Pressure

GECCO '02 Proceedings of the Genetic and Evolutionary Computation Conference
Neutrality and the Evolvability of Boolean Function Landscape

EuroGP '01 Proceedings of the 4th European Conference on Genetic Programming
General schema theory for genetic programming with subtree-swapping crossover: part I

Evolutionary Computation
Feature Selection Algorithms: A Survey and Experimental Evaluation

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
General schema theory for genetic programming with subtree-swapping crossover: Part II

Evolutionary Computation
Genetic Programming for data classification: partitioning the search space

Proceedings of the 2004 ACM symposium on Applied computing
Exact Schema Theory and Markov Chain Models for Genetic Programming and Variable-length Genetic Algorithms with Homologous Crossover

Genetic Programming and Evolvable Machines
A Study of Fitness Distance Correlation as a Difficulty Measure in Genetic Programming

Evolutionary Computation
Using Gaussian distribution to construct fitness functions in genetic programming for multiclass object classification

Pattern Recognition Letters - Special issue: Evolutionary computer vision and image understanding
Fitness-proportional negative slope coefficient as a hardness measure for genetic algorithms

Proceedings of the 9th annual conference on Genetic and evolutionary computation
Parsimony pressure made easy

Proceedings of the 10th annual conference on Genetic and evolutionary computation
Dynamic limits for bloat control in genetic programming and a review of past and current bloat theories

Genetic Programming and Evolvable Machines
A simple but theoretically-motivated method to control bloat in genetic programming

EuroGP'03 Proceedings of the 6th European conference on Genetic programming
A comprehensive view of fitness landscapes with neutrality and fitness clouds

EuroGP'07 Proceedings of the 10th European conference on Genetic programming
The effects of constant neutrality on performance and problem hardness in GP

EuroGP'08 Proceedings of the 11th European conference on Genetic programming
Measuring bloat, overfitting and functional complexity in genetic programming

Proceedings of the 12th annual conference on Genetic and evolutionary computation
A fine-grained view of GP locality with binary decision diagrams as ant phenotypes

PPSN'10 Proceedings of the 11th international conference on Parallel problem solving from nature: Part I
Estimating classifier performance with genetic programming

EuroGP'11 Proceedings of the 14th European conference on Genetic programming

A comparative study of an evolvability indicator and a predictor of expected performance for genetic programming

Proceedings of the 14th annual conference companion on Genetic and evolutionary computation
Searching for novel classifiers

EuroGP'13 Proceedings of the 16th European conference on Genetic Programming
Searching for novel clustering programs

Proceedings of the 15th annual conference on Genetic and evolutionary computation
Identification of epilepsy stages from ECoG using genetic programming classifiers

Computers in Biology and Medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

During the development of applied systems, an important problem that must be addressed is that of choosing the correct tools for a given domain or scenario. This general task has been addressed by the genetic programming (GP) community by attempting to determine the intrinsic difficulty that a problem poses for a GP search. This paper presents an approach to predict the performance of GP applied to data classification, one of the most common problems in computer science. The novelty of the proposal is to extract statistical descriptors and complexity descriptors of the problem data, and from these estimate the expected performance of a GP classifier. We derive two types of predictive models: linear regression models and symbolic regression models evolved with GP. The experimental results show that both approaches provide good estimates of classifier performance, using synthetic and real-world problems for validation. In conclusion, this paper shows that it is possible to accurately predict the expected performance of a GP classifier using a set of descriptors that characterize the problem data.