A MFoM learning approach to robust multiclass multi-label text categorization

  • Authors:
  • Sheng Gao;Wen Wu;Chin-Hui Lee;Tat-Seng Chua

  • Affiliations:
  • Institute for Infocomm Research, Singapore;Carnegie Mellon University, Pittsburgh, PA;Georgia Institute of Technology, Atlanta, GA;National University of Singapore, Singapore

  • Venue:
  • ICML '04 Proceedings of the twenty-first international conference on Machine learning
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a multiclass (MC) classification approach to text categorization (TC). To fully take advantage of both positive and negative training examples, a maximal figure-of-merit (MFoM) learning algorithm is introduced to train high performance MC classifiers. In contrast to conventional binary classification, the proposed MC scheme assigns a uniform score function to each category for each given test sample, and thus the classical Bayes decision rules can now be applied. Since all the MC MFoM classifiers are simultaneously trained, we expect them to be more robust and work better than the binary MFoM classifiers, which are trained separately and are known to give the best TC performance. Experimental results on the Reuters-21578 TC task indicate that the MC MFoM classifiers achieve a micro-averaging F1 value of 0.377, which is significantly better than 0.138, obtained with the binary MFoM classifiers, for the categories with less than 4 training samples. Furthermore, for all 90 categories, most with large training sizes, the MC MFoM classifiers give a micro-averaging F1 value of 0.888, better than 0.884, obtained with the binary MFoM classifiers.