A primer on gene expression and microarrays for machine learning researchers
Journal of Biomedical Informatics - Special issue: Biomedical machine learning
Optimal convex error estimators for classification
Pattern Recognition
Selecting features in microarray classification using ROC curves
Pattern Recognition
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Markov blanket-embedded genetic algorithm for gene selection
Pattern Recognition
Normalization benefits microarray-based classification
EURASIP Journal on Bioinformatics and Systems Biology
EURASIP Journal on Bioinformatics and Systems Biology
Gene expression profile class prediction using linear Bayesian classifiers
Computers in Biology and Medicine
Development of Two-Stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Decorrelation of the true and estimated classifier errors in high-dimensional settings
EURASIP Journal on Bioinformatics and Systems Biology
Classification tree based protein structure distances for testing sequence-structure correlation
Computers in Biology and Medicine
Detecting reliable gene interactions by a hierarchy of Bayesian network classifiers
Computer Methods and Programs in Biomedicine
The peaking phenomenon in the presence of feature-selection
Pattern Recognition Letters
Which is better: holdout or full-sample classifier design?
EURASIP Journal on Bioinformatics and Systems Biology
Cross-validation and bootstrapping are unreliable in small sample classification
Pattern Recognition Letters
Memetic Algorithms for Feature Selection on Microarray Data
ISNN '07 Proceedings of the 4th international symposium on Neural Networks: Advances in Neural Networks
Computational Statistics & Data Analysis
Conditioning-Based Modeling of Contextual Genomic Regulation
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap
Computational Statistics & Data Analysis
A memetic algorithm for gene selection and molecular classification of cancer
Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Microarray analysis of autoimmune diseases by machine learning procedures
IEEE Transactions on Information Technology in Biomedicine
Exact correlation between actual and estimated errors in discrete classification
Pattern Recognition Letters
Impact of error estimation on feature selection
Pattern Recognition
Exact performance of error estimators for discrete classifiers
Pattern Recognition
Feature selection algorithms to find strong genes
Pattern Recognition Letters
On the combination of dissimilarities for gene expression data analysis
ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Ensemble of dissimilarity based classifiers for cancerous samples classification
PRIB'07 Proceedings of the 2nd IAPR international conference on Pattern recognition in bioinformatics
Correlation-based relevancy and redundancy measures for efficient gene selection
PRIB'07 Proceedings of the 2nd IAPR international conference on Pattern recognition in bioinformatics
Computers in Biology and Medicine
Identification of Full and Partial Class Relevant Genes
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Artificial Intelligence in Medicine
Towards a memetic feature selection paradigm
IEEE Computational Intelligence Magazine
IEEE Transactions on Information Theory - Special issue on information theory in molecular biology and neuroscience
Permutation Tests for Studying Classifier Performance
The Journal of Machine Learning Research
The Journal of Machine Learning Research
Semi-supervised approach for finding cancer sub-classes on gene expression data
BSB'10 Proceedings of the Advances in bioinformatics and computational biology, and 5th Brazilian conference on Bioinformatics
Recursive Mahalanobis Separability Measure for Gene Subset Selection
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Computational Statistics & Data Analysis
Expert Systems with Applications: An International Journal
Small-sample error estimation for bagged classification rules
EURASIP Journal on Advances in Signal Processing - Special issue on genomic signal processing
Robust Feature Selection for Microarray Data Based on Multicriterion Fusion
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
SARA'05 Proceedings of the 6th international conference on Abstraction, Reformulation and Approximation
Machine learning techniques and mammographic risk assessment
IWDM'10 Proceedings of the 10th international conference on Digital Mammography
Classifying high-dimensional patterns using a fuzzy logic discriminant network
Advances in Fuzzy Systems - Special issue on Hybrid Biomedical Intelligent Systems
Biclustering-driven ensemble of Bayesian belief network classifiers for underdetermined problems
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part II
Module-based breast cancer classification
International Journal of Data Mining and Bioinformatics
Hi-index | 3.84 |
Motivation: Microarray classification typically possesses two striking attributes: (1) classifier design and error estimation are based on remarkably small samples and (2) cross-validation error estimation is employed in the majority of the papers. Thus, it is necessary to have a quantifiable understanding of the behavior of cross-validation in the context of very small samples. Results: An extensive simulation study has been performed comparing cross-validation, resubstitution and bootstrap estimation for three popular classification rules---linear discriminant analysis, 3-nearest-neighbor and decision trees (CART)---using both synthetic and real breast-cancer patient data. Comparison is via the distribution of differences between the estimated and true errors. Various statistics for the deviation distribution have been computed: mean (for estimator bias), variance (for estimator precision), root-mean square error (for composition of bias and variance) and quartile ranges, including outlier behavior. In general, while cross-validation error estimation is much less biased than resubstitution, it displays excessive variance, which makes individual estimates unreliable for small samples. Bootstrap methods provide improved performance relative to variance, but at a high computational cost and often with increased bias (albeit, much less than with resubstitution). Availability and Supplementary information: A companion web site can be accessed at the URL http://ee.tamu.edu/~edward/cv_paper. The companion web site contains: (1) the complete set of tables and plots regarding the simulation study; (2) additional figures; (3) a compilation of references for microarray classification studies and (4) the source code used, with full documentation and examples.