Ensembles of balanced nested dichotomies for multi-class problems

  • Authors:
  • Lin Dong;Eibe Frank;Stefan Kramer

  • Affiliations:
  • Department of Computer Science, University of Waikato, New Zealand;Department of Computer Science, University of Waikato, New Zealand;Department of Computer Science, Technical University of Munich, Germany

  • Venue:
  • PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

A system of nested dichotomies is a hierarchical decomposition of a multi-class problem with c classes into c–1 two-class problems and can be represented as a tree structure. Ensembles of randomly-generated nested dichotomies have proven to be an effective approach to multi-class learning problems [1]. However, sampling trees by giving each tree equal probability means that the depth of a tree is limited only by the number of classes, and very unbalanced trees can negatively affect runtime. In this paper we investigate two approaches to building balanced nested dichotomies—class-balanced nested dichotomies and data-balanced nested dichotomies—and evaluate them in the same ensemble setting. Using C4.5 decision trees as the base models, we show that both approaches can reduce runtime with little or no effect on accuracy, especially on problems with many classes. We also investigate the effect of caching models when building ensembles of nested dichotomies.