Text Categorization in Non-linear Semantic Space

  • Authors:
  • Claudio Biancalana;Alessandro Micarelli

  • Affiliations:
  • Department of Computer Science and Automation, Artificial Intelligence Laboratory, Roma Tre University, Via della Vasca Navale, 79, 00146 Rome, Italy;Department of Computer Science and Automation, Artificial Intelligence Laboratory, Roma Tre University, Via della Vasca Navale, 79, 00146 Rome, Italy

  • Venue:
  • AI*IA '07 Proceedings of the 10th Congress of the Italian Association for Artificial Intelligence on AI*IA 2007: Artificial Intelligence and Human-Oriented Computing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automatic Text Categorization (TC) is a complex and useful task for many natural language applications, and is usually performed by using a set of manually classified documents, i.e. a training collection. Term-based representation of documents has found widespread use in TC. However, one of the main shortcomings of such methods is that they largely disregard lexical semantics and, as a consequence, are not sufficiently robust with respect to variations in word usage. In this paper we design, implement, and evaluate a new text classification technique. Our main idea consists in finding a series of projections of the training data by using a new, modified LSI algorithm, projecting all training instances to the low-dimensional subspace found in the previous step, and finally inducing a binary search on the projected low-dimensional data. Our conclusion is that, with all its simplicity and efficiency, our approach is comparable to SVM accuracy on classification.