A Decomposition Scheme Based on Error-Correcting Output Codes for Ensembles of Text Categorisers

  • Authors:
  • J. J. García Adeva;R. Calvo

  • Affiliations:
  • University of Sydney;University of Sydney

  • Venue:
  • ICITA '05 Proceedings of the Third International Conference on Information Technology and Applications (ICITA'05) Volume 2 - Volume 02
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Error-Correcting Output Codes (ECOC) are commonly used to decompose a multi-category problem into many dichotomies. Therefore, the text categorisation task is performed by an ensemble of binary classifiers instead of a single monolithic classifier. The ensemble performance largely depends on the characteristics of the decomposition. We propose a decomposition approach where both the categories and the classifiers are well separated in order to maximise the decision boundaries and minimise correlated predictions. We apply this design to the El Mundo corpus (newspaper articles in Spanish) and the well-known ModApté split of the Reuters-21578 corpus. The results using ensembles are favourably compared to those using a monolithic classifier.