Design and Analysis of Classifier Learning Experiments in Bioinformatics: Survey and Case Studies

Authors:
Ozan Irsoy;Olcay Taner Yildiz;Ethem Alpaydin
Affiliations:
Bogazici University, Istanbul;Isik University, Istanbul;Bogazici University, Istanbul
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2012

Citing 35
Cited 0

A critical investigation of recall and precision as measures of retrieval system performance

ACM Transactions on Information Systems (TOIS)
Empirical methods for artificial intelligence

Empirical methods for artificial intelligence
Approximate statistical tests for comparing supervised classification learning algorithms

Neural Computation
Combined 5 × 2 cv F test for comparing supervised classification learning algorithms

Neural Computation
Machine Learning

Machine Learning
On Comparing Classifiers: Pitfalls toAvoid and a Recommended Approach

Data Mining and Knowledge Discovery
A Generalized Hidden Markov Model for the Recognition of Human Genes in DNA

Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology
Inference for the Generalization Error

Machine Learning
Comparing Naive Bayes, Decision Trees, and SVM with AUC and Accuracy

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Estimating replicability of classifier learning experiments

ICML '04 Proceedings of the twenty-first international conference on Machine learning
A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis

Bioinformatics
Ordering and Finding the Best of K2 Supervised Learning Algorithms

IEEE Transactions on Pattern Analysis and Machine Intelligence
The relationship between Precision-Recall and ROC curves

ICML '06 Proceedings of the 23rd international conference on Machine learning
An introduction to ROC analysis

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Cost curves: An improved method for visualizing classifier performance

Machine Learning
Precision-recall operating characteristic (P-ROC) curves in imprecise environments

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 04
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Comparison of Four Performance Metrics for Evaluating Sampling Techniques for Low Quality Class-Imbalanced Data

ICMLA '08 Proceedings of the 2008 Seventh International Conference on Machine Learning and Applications
Nonparametric estimation of the precision-recall curve

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Learning when training data are costly: the effect of class distribution on tree induction

Journal of Artificial Intelligence Research
Feature Selection for Gene Expression Using Model-Based Entropy

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Sparse Support Vector Machines with L_{p} Penalty for Biomarker Identification

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Introduction to Machine Learning

Introduction to Machine Learning
Small-sample precision of ROC-related estimates

Bioinformatics
Cascleave

Bioinformatics
Small-sample precision of ROC-related estimates

Bioinformatics
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition
AUC: a better measure than accuracy in comparing learning algorithms

AI'03 Proceedings of the 16th Canadian society for computational studies of intelligence conference on Advances in artificial intelligence
A CROC stronger than ROC

Bioinformatics
Efficient learning of microbial genotype–phenotype association rules

Bioinformatics
Machine learning of user profiles: representational issues

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
On Position-Specific Scoring Matrix for Protein Function Prediction

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Cost-conscious comparison of supervised learning algorithms over multiple data sets

Pattern Recognition
The rise and fall of supervised machine learning techniques

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many bioinformatics applications, it is important to assess and compare the performances of algorithms trained from data, to be able to draw conclusions unaffected by chance and are therefore significant. Both the design of such experiments and the analysis of the resulting data using statistical tests should be done carefully for the results to carry significance. In this paper, we first review the performance measures used in classification, the basics of experiment design and statistical tests. We then give the results of our survey over 1,500 papers published in the last two years in three bioinformatics journals (including this one). Although the basics of experiment design are well understood, such as resampling instead of using a single training set and the use of different performance metrics instead of error, only 21 percent of the papers use any statistical test for comparison. In the third part, we analyze four different scenarios which we encounter frequently in the bioinformatics literature, discussing the proper statistical methodology as well as showing an example case study for each. With the supplementary software, we hope that the guidelines we discuss will play an important role in future studies.