On the scalability of ordered multi-class ROC analysis

  • Authors:
  • Willem Waegeman;Bernard De Baets;Luc Boullart

  • Affiliations:
  • Department of Electrical Energy, Systems and Automation, Ghent University, Technologiepark 913, B-9052 Ghent, Belgium;Department of Applied Mathematics, Biometrics and Process Control, Ghent University, Coupure links 653, B-9000 Ghent, Belgium;Department of Electrical Energy, Systems and Automation, Ghent University, Technologiepark 913, B-9052 Ghent, Belgium

  • Venue:
  • Computational Statistics & Data Analysis
  • Year:
  • 2008

Quantified Score

Hi-index 0.03

Visualization

Abstract

Receiver operating characteristics (ROC) analysis provides a way to select possibly optimal models for discriminating two kinds of objects without the need of specifying the cost or class distribution. It is nowadays established as a standard analysis tool in different domains, including medical decision making, pattern recognition and machine learning. Recently, an extension to the ordered multi-class case has been proposed, in which the concept of a ROC curve is generalized to an r-dimensional surface for r ordered categories, and the volume under this ROC surface (VUS) measures the overall power of a model to classify objects of the various categories. However, the computation of this criterion as well as the U-statistics estimators of its variance and covariance for two models is believed to be complex. New algorithms to compute VUS and its (co)variance estimator are presented. In particular, the volume under the ROC surface can be found very efficiently with a simple dynamic program dominated by a single sorting operation on the data set. For the variance and covariance, the respective estimators are reformulated as a series of recurrent functions over layered data graphs and subsequently these functions are rapidly evaluated with a dynamic program. Simulation experiments confirm that the presented algorithms scale well with respect to the size of the data set and the number of categories. For example, the volume under the ROC surface could be rapidly computed on very large data sets of more than 500 000 instances, while a naive implementation spent much more time on data sets of size less than 1000.