Document categorization based on minimum loss of reconstruction information

  • Authors:
  • Juan Carlos Gomez;Marie-Francine Moens

  • Affiliations:
  • Department of Computer Science, Katholieke Universiteit Leuven, Heverlee, Belgium;Department of Computer Science, Katholieke Universiteit Leuven, Heverlee, Belgium

  • Venue:
  • MICAI'12 Proceedings of the 11th Mexican international conference on Advances in Computational Intelligence - Volume Part II
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we present and validate a novel approach for single-label multi-class document categorization. The proposed categorization approach relies on the statistical property of Principal Component Analysis (PCA), which minimizes the reconstruction error of the training documents used to compute a low-rank category transformation matrix. This matrix allows projecting the original training documents from a given category to a new low-rank space and then optimally reconstructs them to the original space with a minimum loss of information. The proposed method, called Minimum Loss of Reconstruction Information (mLRI) classifier, uses this property, extends and applies it to unseen documents. Several experiments on three well-known multi-class datasets for text categorization are conducted in order to highlight the stable and generally better performance of the proposed approach in comparison with other popular categorization methods.