Decision trees for hierarchical multilabel classification: a case study in functional genomics

  • Authors:
  • Hendrik Blockeel;Leander Schietgat;Jan Struyf;Sašo Džeroski;Amanda Clare

  • Affiliations:
  • Department of Computer Science, Katholieke Universiteit Leuven, Leuven, Belgium;Department of Computer Science, Katholieke Universiteit Leuven, Leuven, Belgium;Department of Computer Science, Katholieke Universiteit Leuven, Leuven, Belgium;Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia;Department of Computer Science, University of Wales Aberystwyth, UK

  • Venue:
  • PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Hierarchical multilabel classification (HMC) is a variant of classification where instances may belong to multiple classes organized in a hierarchy. The task is relevant for several application domains. This paper presents an empirical study of decision tree approaches to HMC in the area of functional genomics. We compare learning a single HMC tree (which makes predictions for all classes together) to learning a set of regular classification trees (one for each class). Interestingly, on all 12 datasets we use, the HMC tree wins on all fronts: it is faster to learn and to apply, easier to interpret, and has similar or better predictive performance than the set of regular trees. It turns out that HMC tree learning is more robust to overfitting than regular tree learning.