On the chance accuracies of large collections of classifiers

Authors:
Mark Palatucci;Andrew Carlson
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
Proceedings of the 25th international conference on Machine learning
Year:
2008

Citing 6
Cited 1

Multiple Comparisons in Induction Algorithms

Machine Learning
Gene selection criterion for discriminant microarray data analysis based on extreme value distributions

RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
Using a Permutation Test for Attribute Selection in Decision Trees

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Rule-based anomaly pattern detection for detecting disease outbreaks

Eighteenth national conference on Artificial intelligence
An introduction to variable and feature selection

The Journal of Machine Learning Research
Learning to Decode Cognitive States from Brain Images

Machine Learning

A new feature selection algorithm based on binomial hypothesis testing for spam filtering

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We provide a theoretical analysis of the chance accuracies of large collections of classifiers. We show that on problems with small numbers of examples, some classifier can perform well by random chance, and we derive a theorem to explicitly calculate this accuracy. We use this theorem to provide a principled feature selection criterion for sparse, high-dimensional problems. We evaluate this method on microarray and fMRI datasets and show that it performs very close to the optimal accuracy obtained from an oracle. We also show that on the fMRI dataset this technique chooses relevant features successfully while another state-of-the-art method, the False Discovery Rate (FDR), completely fails at standard significance levels.