Repairing fractures between data using genetic programming-based feature extraction: A case study in cancer diagnosis

Authors:
Jose G. Moreno-Torres;Xavier Llorí;David E. Goldberg;Rohit Bhargava
Affiliations:
Department of Computer Science and Artificial Intelligence, Universidad de Granada, 18071 Granada, Spain;National Center for Supercomputing Applications (NCSA), University of Illinois at Urbana-Champaign 1205 W. Clark Street, Urbana, Illinois, USA;Illinois Genetic Algorithms Laboratory (IlliGAL) University of Illinois at Urbana-Champaign 104 S. Mathews Ave, Urbana, Illinois, USA;Department of Bioengineering, University of Illinois at Urbana-Champaign 405 N. Mathews Ave, Urbana, Illinois, USA
Venue:
Information Sciences: an International Journal
Year:
2013

Citing 35
Cited 5

Using the genetic algorithm to generate LISP source code to solve the prisoner's dilemma

Proceedings of the Second International Conference on Genetic Algorithms on Genetic algorithms and their application
Genetic programming: on the programming of computers by means of natural selection

Genetic programming: on the programming of computers by means of natural selection
C4.5: programs for machine learning

C4.5: programs for machine learning
Genetic Programming for Feature Discovery and Image Discrimination

Proceedings of the 5th International Conference on Genetic Algorithms
The Effect of Extensive Use of the Mutation Operator on Generalization in Genetic Programming Using Sparse Data Sets

PPSN IV Proceedings of the 4th International Conference on Parallel Problem Solving from Nature
Facial Component Extraction and Face Recognition with Support Vector Machines

FGR '02 Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition
An analysis of genetic programming

An analysis of genetic programming
An introduction to variable and feature selection

The Journal of Machine Learning Research
Genetic Programming with a Genetic Algorithm for Feature Construction and Selection

Genetic Programming and Evolvable Machines
A survey of mutation techniques in genetic programming

Proceedings of the 8th annual conference on Genetic and evolutionary computation
Biostatistical Analysis (5th Edition)

Biostatistical Analysis (5th Edition)
Breast cancer diagnosis using genetic programming generated feature

Pattern Recognition
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Population variation in genetic programming

Information Sciences: an International Journal
Towards better than human capability in diagnosing prostate cancer using infrared spectroscopic imaging

Proceedings of the 9th annual conference on Genetic and evolutionary computation
Classifier design with feature selection and feature extraction using layered genetic programming

Expert Systems with Applications: An International Journal
Routine high-return human-competitive automated problem-solving by means of genetic programming

Information Sciences: an International Journal
Conceptual equivalence for contrast mining in classification learning

Data & Knowledge Engineering
KEEL: a software tool to assess evolutionary algorithms for data mining problems

Soft Computing - A Fusion of Foundations, Methodologies and Applications - Special Issue on Evolutionary and Metaheuristics based Data Mining (EMBDM); Guest Editors: José A. Gámez, María J. del Jesús, José M. Puerta
Dataset Shift in Machine Learning

Dataset Shift in Machine Learning
A framework for monitoring classifiers’ performance: when and why failure occurs?

Knowledge and Information Systems
Dynamic population variation in genetic programming

Information Sciences: an International Journal
A generic multi-dimensional feature extraction method using multiobjective genetic programming

Evolutionary Computation
Observer-invariant histopathology using genetics-based machine learning

Natural Computing: an international journal
Handbook of Parametric and Nonparametric Statistical Procedures

Handbook of Parametric and Nonparametric Statistical Procedures
A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability

Soft Computing - A Fusion of Foundations, Methodologies and Applications
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
GP-COACH: Genetic Programming-based learning of COmpact and ACcurate fuzzy rule-based classification systems for High-dimensional problems

Information Sciences: an International Journal
Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power

Information Sciences: an International Journal
An analysis of diversity of constants of genetic programming

EuroGP'03 Proceedings of the 6th European conference on Genetic programming
A survey on the application of genetic programming to classification

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Applying cost-sensitive multiobjective genetic programming to feature extraction for spam e-mail filtering

EuroGP'08 Proceedings of the 11th European conference on Genetic programming
A Field Guide to Genetic Programming

A Field Guide to Genetic Programming
G3P-MI: A genetic programming algorithm for multiple instance learning

Information Sciences: an International Journal
Facial feature extraction using PCA and wavelet multi-resolution images

FGR' 04 Proceedings of the Sixth IEEE international conference on Automatic face and gesture recognition

Addressing the classification with imbalanced data: open problems and new challenges on class distribution

HAIS'11 Proceedings of the 6th international conference on Hybrid artificial intelligent systems - Volume Part I
A unifying view on dataset shift in classification

Pattern Recognition
Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics

Expert Systems with Applications: An International Journal
Genetic programming-based feature transform and classification for the automatic detection of pulmonary nodules on computed tomography images

Information Sciences: an International Journal
Class distribution estimation based on the Hellinger distance

Information Sciences: an International Journal

Quantified Score

Hi-index	0.07

Visualization

Abstract

There is an underlying assumption on most model building processes: given a learned classifier, it should be usable to explain unseen data from the same given problem. Despite this seemingly reasonable assumption, when dealing with biological data it tends to fail; where classifiers built out of data generated using the same protocols in two different laboratories can lead to two different, non-interchangeable, classifiers. There are usually too many uncontrollable variables in the process of generating data in the lab and biological variations, and small differences can lead to very different data distributions, with a fracture between data. This paper presents a genetics-based machine learning approach that performs feature extraction on data from a lab to help increase the classification performance of an existing classifier that was built using the data from a different laboratory which uses the same protocols, while learning about the shape of the fractures between data that motivated the bad behavior. The experimental analysis over benchmark problems together with a real-world problem on prostate cancer diagnosis show the good behavior of the proposed algorithm.