A critical investigation of recall and precision as measures of retrieval system performance
ACM Transactions on Information Systems (TOIS)
Empirical methods for artificial intelligence
Empirical methods for artificial intelligence
Machine Learning
On Comparing Classifiers: Pitfalls toAvoid and a Recommended Approach
Data Mining and Knowledge Discovery
A Generalized Hidden Markov Model for the Recognition of Human Genes in DNA
Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology
Inference for the Generalization Error
Machine Learning
Comparing Naive Bayes, Decision Trees, and SVM with AUC and Accuracy
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Estimating replicability of classifier learning experiments
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Ordering and Finding the Best of K2 Supervised Learning Algorithms
IEEE Transactions on Pattern Analysis and Machine Intelligence
The relationship between Precision-Recall and ROC curves
ICML '06 Proceedings of the 23rd international conference on Machine learning
An introduction to ROC analysis
Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Precision-recall operating characteristic (P-ROC) curves in imprecise environments
ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 04
Statistical Comparisons of Classifiers over Multiple Data Sets
The Journal of Machine Learning Research
ICMLA '08 Proceedings of the 2008 Seventh International Conference on Machine Learning and Applications
Nonparametric estimation of the precision-recall curve
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Learning when training data are costly: the effect of class distribution on tree induction
Journal of Artificial Intelligence Research
Feature Selection for Gene Expression Using Model-Based Entropy
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Sparse Support Vector Machines with L_{p} Penalty for Biomarker Identification
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Introduction to Machine Learning
Introduction to Machine Learning
Small-sample precision of ROC-related estimates
Bioinformatics
Bioinformatics
Small-sample precision of ROC-related estimates
Bioinformatics
AUC: a better measure than accuracy in comparing learning algorithms
AI'03 Proceedings of the 16th Canadian society for computational studies of intelligence conference on Advances in artificial intelligence
Bioinformatics
Machine learning of user profiles: representational issues
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
On Position-Specific Scoring Matrix for Protein Function Prediction
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
The rise and fall of supervised machine learning techniques
Bioinformatics
Hi-index | 0.00 |
In many bioinformatics applications, it is important to assess and compare the performances of algorithms trained from data, to be able to draw conclusions unaffected by chance and are therefore significant. Both the design of such experiments and the analysis of the resulting data using statistical tests should be done carefully for the results to carry significance. In this paper, we first review the performance measures used in classification, the basics of experiment design and statistical tests. We then give the results of our survey over 1,500 papers published in the last two years in three bioinformatics journals (including this one). Although the basics of experiment design are well understood, such as resampling instead of using a single training set and the use of different performance metrics instead of error, only 21 percent of the papers use any statistical test for comparison. In the third part, we analyze four different scenarios which we encounter frequently in the bioinformatics literature, discussing the proper statistical methodology as well as showing an example case study for each. With the supplementary software, we hope that the guidelines we discuss will play an important role in future studies.