Text representation in multi-label classification: two new input representations

  • Authors:
  • Rodrigo Alfaro;Héctor Allende

  • Affiliations:
  • Universidad Técnica Federico Santa María, Chile and Pontificia Universidad Católica de Valparaíso, Chile;Universidad Técnica Federico Santa María, Chile and Universidad Adolfo Ibáñez, Chile

  • Venue:
  • ICANNGA'11 Proceedings of the 10th international conference on Adaptive and natural computing algorithms - Volume Part II
  • Year:
  • 2011

Quantified Score

Hi-index 0.01

Visualization

Abstract

Automatic text classification is the task of assigning unseen documents to a predefined set of classes. Text representation for classification purposes has been traditionally approached using a vector space model due to its simplicity and good performance. On the other hand, multi-label automatic text classification has been typically addressed either by transforming the problem under study to apply binary techniques or by adapting binary algorithms to work with multiple labels. In this paper we present two new representations for text documents based on label-dependent term-weighting for multi-label classification. We focus on modifying the input. Performance was tested with a wellknown dataset and compared to alternative techniques. Experimental results based on Hamming loss analysis show an improvement against alternative approaches.