Classification by ensembles from random partitions of high-dimensional data
Computational Statistics & Data Analysis
Support feature machine for classification of abnormal brain activity
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Evaluation of the effects of Gabor filter parameters on texture classification
Pattern Recognition
EURASIP Journal on Bioinformatics and Systems Biology
Ensemble methods for classification of patients for personalized medicine with high-dimensional data
Artificial Intelligence in Medicine
Decorrelation of the true and estimated classifier errors in high-dimensional settings
EURASIP Journal on Bioinformatics and Systems Biology
A decision support system to facilitate management of patients with acute gastrointestinal bleeding
Artificial Intelligence in Medicine
Which is better: holdout or full-sample classifier design?
EURASIP Journal on Bioinformatics and Systems Biology
Estimating the Confidence Interval for Prediction Errors of Support Vector Machine Classifiers
The Journal of Machine Learning Research
Distribution modeling and simulation of gene expression data
Computational Statistics & Data Analysis
Computational Statistics & Data Analysis
Cancer informatics by prototype networks in mass spectrometry
Artificial Intelligence in Medicine
International Journal of Data Mining and Bioinformatics
Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap
Computational Statistics & Data Analysis
Boosting support vector machines using multiple dissimilarities
KES'07/WIRN'07 Proceedings of the 11th international conference, KES 2007 and XVII Italian workshop on neural networks conference on Knowledge-based intelligent information and engineering systems: Part I
On the combination of dissimilarities for gene expression data analysis
ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Ensemble of dissimilarity based classifiers for cancerous samples classification
PRIB'07 Proceedings of the 2nd IAPR international conference on Pattern recognition in bioinformatics
Artificial Intelligence in Medicine
Computational Statistics & Data Analysis
Permutation Tests for Studying Classifier Performance
The Journal of Machine Learning Research
Computational Biology and Chemistry
Adaptive sparse polynomial chaos expansion based on least angle regression
Journal of Computational Physics
Engineering Applications of Artificial Intelligence
ACIIDS'11 Proceedings of the Third international conference on Intelligent information and database systems - Volume Part II
An empirical approach to model selection through validation for censored survival data
Journal of Biomedical Informatics
Empirical comparison of resampling methods using genetic neural networks for a regression problem
HAIS'11 Proceedings of the 6th international conference on Hybrid artificial intelligent systems - Volume Part II
KES-AMSTA'11 Proceedings of the 5th KES international conference on Agent and multi-agent systems: technologies and applications
ISMIS'11 Proceedings of the 19th international conference on Foundations of intelligent systems
Empirical comparison of resampling methods using genetic fuzzy systems for a regression problem
IDEAL'11 Proceedings of the 12th international conference on Intelligent data engineering and automated learning
The extraction method of DNA microarray features based on experimental A statistics
RSKT'11 Proceedings of the 6th international conference on Rough sets and knowledge technology
New results on minimum error entropy decision trees
CIARP'11 Proceedings of the 16th Iberoamerican Congress conference on Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
Resampling methods for meta-model validation with recommendations for evolutionary computation
Evolutionary Computation
Software effort models should be assessed via leave-one-out validation
Journal of Systems and Software
IPMI'13 Proceedings of the 23rd international conference on Information Processing in Medical Imaging
Wastewater treatment plant performance prediction with support vector machines
ICDM'13 Proceedings of the 13th international conference on Advances in Data Mining: applications and theoretical aspects
Hi-index | 3.84 |
Motivation: In genomic studies, thousands of features are collected on relatively few samples. One of the goals of these studies is to build classifiers to predict the outcome of future observations. There are three inherent steps to this process: feature selection, model selection and prediction assessment. With a focus on prediction assessment, we compare several methods for estimating the 'true' prediction error of a prediction model in the presence of feature selection. Results: For small studies where features are selected from thousands of candidates, the resubstitution and simple split-sample estimates are seriously biased. In these small samples, leave-one-out cross-validation (LOOCV), 10-fold cross-validation (CV) and the .632+ bootstrap have the smallest bias for diagonal discriminant analysis, nearest neighbor and classification trees. LOOCV and 10-fold CV have the smallest bias for linear discriminant analysis. Additionally, LOOCV, 5- and 10-fold CV, and the .632+ bootstrap have the lowest mean square error. The .632+ bootstrap is quite biased in small sample sizes with strong signal-to-noise ratios. Differences in performance among resampling methods are reduced as the number of specimens available increase. Contact: annette.molinaro@yale.edu Supplementary Information: A complete compilation of results and R code for simulations and analyses are available in Molinaro et al. (2005) (http://linus.nci.nih.gov/brb/TechReport.htm).