TreeBoost.MH: a boosting algorithm for multi-label hierarchical text categorization

  • Authors:
  • Andrea Esuli;Tiziano Fagni;Fabrizio Sebastiani

  • Affiliations:
  • Istituto di Scienza e Tecnologia dell’Informazione, Consiglio Nazionale delle Ricerche, Pisa, Italy;Istituto di Scienza e Tecnologia dell’Informazione, Consiglio Nazionale delle Ricerche, Pisa, Italy;Istituto di Scienza e Tecnologia dell’Informazione, Consiglio Nazionale delle Ricerche, Pisa, Italy

  • Venue:
  • SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we propose TreeBoost.MH, an algorithm for multi-label Hierarchical Text Categorization (HTC) consisting of a hierarchical variant of AdaBoost.MH. TreeBoost.MH embodies several intuitions that had arisen before within HTC: e.g. the intuitions that both feature selection and the selection of negative training examples should be performed “locally”, i.e. by paying attention to the topology of the classification scheme. It also embodies the novel intuition that the weight distribution that boosting algorithms update at every boosting round should likewise be updated “locally”. We present the results of experimenting TreeBoost.MH on two HTC benchmarks, and discuss analytically its computational cost.