Estimating replicability of classifier learning experiments

Authors:
Remco R. Bouckaert
Affiliations:
University of Waikato, Hamilton, New Zealand
Venue:
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Year:
2004

Citing 6
Cited 5

C4.5: programs for machine learning

C4.5: programs for machine learning
Approximate statistical tests for comparing supervised classification learning algorithms

Neural Computation
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Concrete Math

Concrete Math
On Comparing Classifiers: Pitfalls toAvoid and a Recommended Approach

Data Mining and Knowledge Discovery
Estimating continuous distributions in Bayesian classifiers

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence

On the classification performance of TAN and general Bayesian networks

Knowledge-Based Systems
Histogram distance-based Bayesian Network structure learning: A supervised classification specific approach

Decision Support Systems
The von Mises naive Bayes classifier for angular data

CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence
Cost-conscious comparison of supervised learning algorithms over multiple data sets

Pattern Recognition
Design and Analysis of Classifier Learning Experiments in Bioinformatics: Survey and Case Studies

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Replicability of machine learning experiments measures how likely it is that the outcome of one experiment is repeated when performed with a different randomization of the data. In this paper, we present an estimator of replicability of an experiment that is efficient. More precisely, the estimator is unbiased and has lowest variance in the class of estimators formed by a linear combination of outcomes of experiments on a given data set.We gathered empirical data for comparing experiments consisting of different sampling schemes and hypothesis tests. Both factors are shown to have an impact on replicability of experiments. The data suggests that sign tests should not be used due to low replicability. Ranked sum tests show better performance, but the combination of a sorted runs sampling scheme with a t-test gives the most desirable performance judged on Type I and II error and replicability.