Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy

  • Authors:
  • Ludmila I. Kuncheva;Christopher J. Whitaker

  • Affiliations:
  • School of Informatics, University of Wales, Bangor, Dean Street, Bangor, Gwynedd, LL57 1UT, UK. l.i.kuncheva@bangor.ac.uk;School of Informatics, University of Wales, Bangor, Dean Street, Bangor, Gwynedd, LL57 1UT, UK. c.j.whitaker@bangor.ac.uk

  • Venue:
  • Machine Learning
  • Year:
  • 2003

Quantified Score

Hi-index 0.02

Visualization

Abstract

Diversity among the members of a team of classifiers is deemed to be a key issue in classifier combination. However, measuring diversity is not straightforward because there is no generally accepted formal definition. We have found and studied ten statistics which can measure diversity among binary classifier outputs (correct or incorrect vote for the class label): four averaged pairwise measures (the Q statistic, the correlation, the disagreement and the double fault) and six non-pairwise measures (the entropy of the votes, the difficulty index, the Kohavi-Wolpert variance, the interrater agreement, the generalized diversity, and the coincident failure diversity). Four experiments have been designed to examine the relationship between the accuracy of the team and the measures of diversity, and among the measures themselves. Although there are proven connections between diversity and accuracy in some special cases, our results raise some doubts about the usefulness of diversity measures in building classifier ensembles in real-life pattern recognition problems.