Subpopulation-specific confidence designation for more informative biomedical classification

Authors:
Chuanlei Zhang;Ralph L. Kodell
Affiliations:
-;-
Venue:
Artificial Intelligence in Medicine
Year:
2013

Citing 7
Cited 0

Case-Based Reasoning with Confidence

EWCBR '00 Proceedings of the 5th European Workshop on Advances in Case-Based Reasoning
Ensemble-based classifiers

Artificial Intelligence Review
Ensemble confidence estimates posterior probability

MCS'05 Proceedings of the 6th international conference on Multiple Classifier Systems
Selective voting in convex-hull ensembles improves classification accuracy

Artificial Intelligence in Medicine
“Good” and “bad” diversity in majority vote ensembles

MCS'10 Proceedings of the 9th international conference on Multiple Classifier Systems
Application of majority voting to pattern recognition: an analysis of its behavior and performance

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
A Bound on Kappa-Error Diagrams for Analysis of Classifier Ensembles

IEEE Transactions on Knowledge and Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Objective: Although classification algorithms are promising tools to support clinical diagnosis and treatment of disease, the usual implicit assumption underlying these algorithms, that all patients are homogeneous with respect to characteristics of interest, is unsatisfactory. The objective here is to exploit the population heterogeneity reflected by characteristics that may not be apparent and thus not controlled, in order to differentiate levels of classification accuracy between subpopulations and further the goal of tailoring therapies on an individual basis. Methods and materials: A new subpopulation-based confidence approach is developed in the context of a selective voting algorithm defined by an ensemble of convex-hull classifiers. Populations of training samples are divided into three subpopulations that are internally homogeneous, with different levels of predictivity. Two different distance measures are used to cluster training samples into subpopulations and assign test samples to these subpopulations. Results: Validation of the new approach's levels of confidence of classification is carried out using six publicly available datasets. Our approach demonstrates a positive correspondence between the predictivity designations derived from training samples and the classification accuracy of test samples. The average difference between highest- and lowest-confidence accuracies for the six datasets is 17.8%, with a minimum of 11.3% and a maximum of 24.1%. Conclusion: The classification accuracy increases as the designated confidence increases.