Review Article: Stable feature selection for biomarker discovery
Computational Biology and Chemistry
Feature selection stability assessment based on the Jensen-Shannon divergence
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Ensemble logistic regression for feature selection
PRIB'11 Proceedings of the 6th IAPR international conference on Pattern recognition in bioinformatics
Sparse and stable gene selection with consensus SVM-RFE
Pattern Recognition Letters
Stable Gene Selection from Microarray Data via Sample Weighting
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Model mining for robust feature selection
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Measuring stability of feature ranking techniques: a noise-based approach
International Journal of Business Intelligence and Data Mining
Ensemble transcript interaction networks: A case study on Alzheimer's disease
Computer Methods and Programs in Biomedicine
A novel multi-stage feature selection method for microarray expression data analysis
International Journal of Data Mining and Bioinformatics
Feature selection using counting grids: application to microarray data
SSPR'12/SPR'12 Proceedings of the 2012 Joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition
Multiclass Gene Selection Using Pareto-Fronts
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Sparse high-dimensional fractional-norm support vector machine via DC programming
Computational Statistics & Data Analysis
Proceedings of the 7th international workshop on Data and text mining in biomedical informatics
Simultaneous sample and gene selection using t-score and approximate support vectors
PRIB'13 Proceedings of the 8th IAPR international conference on Pattern Recognition in Bioinformatics
A survey on feature selection methods
Computers and Electrical Engineering
Computers in Biology and Medicine
A novel class dependent feature selection method for cancer biomarker discovery
Computers in Biology and Medicine
Hi-index | 3.84 |
Motivation: Biomarker discovery is an important topic in biomedical applications of computational biology, including applications such as gene and SNP selection from high-dimensional data. Surprisingly, the stability with respect to sampling variation or robustness of such selection processes has received attention only recently. However, robustness of biomarkers is an important issue, as it may greatly influence subsequent biological validations. In addition, a more robust set of markers may strengthen the confidence of an expert in the results of a selection method. Results: Our first contribution is a general framework for the analysis of the robustness of a biomarker selection algorithm. Secondly, we conducted a large-scale analysis of the recently introduced concept of ensemble feature selection, where multiple feature selections are combined in order to increase the robustness of the final set of selected features. We focus on selection methods that are embedded in the estimation of support vector machines (SVMs). SVMs are powerful classification models that have shown state-of-the-art performance on several diagnosis and prognosis tasks on biological data. Their feature selection extensions also offered good results for gene selection tasks. We show that the robustness of SVMs for biomarker discovery can be substantially increased by using ensemble feature selection techniques, while at the same time improving upon classification performances. The proposed methodology is evaluated on four microarray datasets showing increases of up to almost 30% in robustness of the selected biomarkers, along with an improvement of ~15% in classification performance. The stability improvement with ensemble methods is particularly noticeable for small signature sizes (a few tens of genes), which is most relevant for the design of a diagnosis or prognosis model from a gene signature. Contact: yvan.saeys@psb.ugent.be Supplementary information: Supplementary data are available at Bioinformatics online.