AUC: a statistically consistent and more discriminating measure than accuracy

  • Authors:
  • Charles X. Ling;Jin Huang;Harry Zhang

  • Affiliations:
  • Department of Computer Science, The University of Western Ontario, London, Ontario, Canada;Department of Computer Science, The University of Western Ontario, London, Ontario, Canada;Faculty of Computer Science, University of New Brunswick, Fredericton, NB, Canada

  • Venue:
  • IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Predictive accuracy has been used as the main and often only evaluation criterion for the predictive performance of classification learning algorithms. In recent years, the area under the ROC (Receiver Operating Characteristics) curve, or simply AUC, has been proposed as an alternative single-number measure for evaluating learning algorithms. In this paper, we prove that AUC is a better measure than accuracy. More specifically, we present rigourous definitions on consistency and discriminancy in comparing two evaluation measures for learning algorithms. We then present empirical evaluations and a formal proof to establish that AUC is indeed statistically consistent and more discriminating than accuracy. Our result is quite significant since we formally prove that, for the first time, AUC is a better measure than accuracy in the evaluation of learning algorithms.