Review Article: Stable feature selection for biomarker discovery
Computational Biology and Chemistry
An introduction to spectral distances in networks
Proceedings of the 2011 conference on Neural Nets WIRN10: Proceedings of the 20th Italian Workshop on Neural Nets
Feature selection stability assessment based on the Jensen-Shannon divergence
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Stable Gene Selection from Microarray Data via Sample Weighting
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Minimizing time when applying bootstrap to contingency tables analysis of genome-wide data
LION'12 Proceedings of the 6th international conference on Learning and Intelligent Optimization
Hi-index | 3.84 |
Motivation: We propose a method for studying the stability of biomarker lists obtained from functional genomics studies. It is common to adopt resampling methods to tune and evaluate marker-based diagnostic and prognostic systems in order to prevent selection bias. Such caution promotes honest estimation of class prediction, but leads to alternative sets of solutions. In microarray studies, the difference in lists may be bewildering, also due to the presence of modules of functionally related genes. Methods for assessing stability understand the dependency of the markers on the data or on the predictor's type and help selecting solutions. Results: A computational framework for comparing sets of ranked biomarker lists is presented. Notions and algorithms are based on concepts from permutation group theory. We introduce several algebraic indicators and metric methods for symmetric groups, including the Canberra distance, a weighted version of Spearman's footrule. We also consider distances between partial lists and an aggregation of sets of lists into an optimal list based on voting theory (Borda count). The stability indicators are applied in practical situations to several synthetic, cancer microarray and proteomics datasets. The addressed issues are predictive classification, presence of modules, comparison of alternative biomarker lists, outlier removal, control of selection bias by randomization techniques and enrichment analysis. Availability: Supplementary Material and software are available at the address http://biodcv.fbk.eu/listspy.html Contact: furlan@fbk.eu Supplementary information: Supplementary data are available at Bioinformatics online.