Win percentage: a novel measure for assessing the suitability of machine classifiers for biological problems

Authors:
R. Mitchell Parry;John H. Phan;May D. Wang
Affiliations:
Emory University/Georgia Tech, Atlanta, GA;Emory University/Georgia Tech, Atlanta, GA;Emory University/Georgia Tech, Atlanta, GA
Venue:
Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Year:
2011

Citing 8
Cited 0

Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
An introduction to variable and feature selection

The Journal of Machine Learning Research
Genetic algorithms, selection schemes, and the varying effects of noise

Evolutionary Computation
Monte Carlo feature selection for supervised classification

Bioinformatics
Performance of feature-selection methods in the classification of high-dimension data

Pattern Recognition
Relationship preserving feature selection for unlabelled clinical trials time-series

Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
Random forest-based prediction of protein sumoylation sites from sequence features

Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
An efficient statistical feature selection approach for classification of gene expression data

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Selecting an appropriate classifier for a particular biological application poses a difficult problem for researchers and practitioners alike. We propose a novel measure for assessing the suitability of machine classifiers for particular problems called "win percentage." We define win percentage as the probability a classifier will perform better than its peers on a finite random sample of feature sets, giving each classifier equal opportunity to find suitable features. We illustrate the utility of this method using synthetic data. Then, we evaluate six classifiers in analyzing eight microarray datasets representing three diseases: breast cancer, multiple myeloma, and neuroblastoma. Fundamentally, we illustrate that the selection of the most suitable classifier (i.e., one that is more likely to perform better than its peers) not only depends on the dataset and application but also on the thoroughness of feature selection. In particular, win percentage provides a single measurement that could assist users in eliminating or selecting classifiers for their particular application and will be accessible from www.biomiblab.org.