Learning probabilistic decision trees for AUC

  • Authors:
  • Harry Zhang;Jiang Su

  • Affiliations:
  • Faculty of Computer Science, University of New Brunswick, P.O. Box 4400, Fredericton, NB, Canada E3B 5A3;Faculty of Computer Science, University of New Brunswick, P.O. Box 4400, Fredericton, NB, Canada E3B 5A3

  • Venue:
  • Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Accurate ranking, measured by AUC (the area under the ROC curve), is crucial in many real-world applications. Most traditional learning algorithms, however, aim only at high classification accuracy. It has been observed that traditional decision trees produce good classification accuracy but poor probability estimates. Since the ranking generated by a decision tree is based on the class probabilities, a probability estimation tree (PET) with accurate probability estimates is desired in order to yield high AUC. Some researchers ascribe the poor probability estimates of decision trees to the decision tree learning algorithms. To our observation, however, the representation also plays an important role. In this paper, we propose to extend decision trees to represent a joint distribution and conditional independence, called conditional independence trees (CITrees), which is a more suitable model for yielding high AUC. We propose a novel AUC-based algorithm for learning CITrees, and our experiments show that the CITree algorithm outperforms the state-of-the-art decision tree learning algorithm C4.4 (a variant of C4.5), naive Bayes, and NBTree in AUC. Our work provides an effective model and algorithm for applications in which an accurate ranking is required.