Sparse and stable gene selection with consensus SVM-RFE

Authors:
E. Tapia;P. Bulacio;L. Angelone
Affiliations:
CIFASIS - Conicet, Centro Internacional Franco Argentino de Ciencias de la Informacion y de Sistemas, 27 de Febrero 210 bis, S2000EZP Rosario, Argentina and Facultad de Ciencias Exactas, Ingenier& ...;CIFASIS - Conicet, Centro Internacional Franco Argentino de Ciencias de la Informacion y de Sistemas, 27 de Febrero 210 bis, S2000EZP Rosario, Argentina and Facultad de Ciencias Exactas, Ingenier& ...;CIFASIS - Conicet, Centro Internacional Franco Argentino de Ciencias de la Informacion y de Sistemas, 27 de Febrero 210 bis, S2000EZP Rosario, Argentina and Facultad de Ciencias Exactas, Ingenier& ...
Venue:
Pattern Recognition Letters
Year:
2012

Citing 11
Cited 0

Recall-precision trade-off: a derivation

Journal of the American Society for Information Science
A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
The relationship between recall and precision

Journal of the American Society for Information Science
Gene Selection for Cancer Classification using Support Vector Machines

Machine Learning
Outcome signature genes in breast cancer: is there a unique set?

Bioinformatics
Rosetta error model for gene expression analysis

Bioinformatics
Rosetta error model for gene expression analysis

Bioinformatics
Development of Two-Stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
MSVM-RFE

Bioinformatics
Robust biomarker identification for cancer diagnosis with ensemble feature selection methods

Bioinformatics
Evaluating Stability and Comparing Output of Feature Selectors that Optimize Feature Subset Cardinality

IEEE Transactions on Pattern Analysis and Machine Intelligence

Quantified Score

Hi-index	0.10

Visualization

Abstract

A method is described for performing sparse and stable gene selection from a number of unstable, but low cost, SVM-RFE units referred to as SVM-RFE subunits. Using a comprehensive simulation study, we show that the introduction of a consensus constraint with respect to variations in the policy of gene removal and a stability constraint with respect to perturbations in the training data can remarkably improve gene selection precision, dimensionality reduction ratio and stability of low cost SVM-RFE subunits still guaranteeing affordable computational costs. The method, which does not require the preselection of the number of selected genes, is divided into two stages. Multiple rough gene removal policies are first applied to multiple surrogate training datasets (spreading). Multiple consensus gene sets with respect to variations in the gene removal policy are then obtained and passed through a stability filter which selects the best performing gene set (despreading). Hence, while the consensus constraint performs strong dimensionality reduction at affordable computational costs, the stability constraint ensures acceptable indexes of gene selection stability and further dimensionality reduction. The method is validated on three benchmark microarray datasets.