Text Classification on Embedded Manifolds

  • Authors:
  • Catarina Silva;Bernardete Ribeiro

  • Affiliations:
  • School of Technology and Management, Polytechnic Institute of Leiria, Portugal and Dep. Informatics Eng., Center Informatics and Systems, Univ. of Coimbra, Portugal;Dep. Informatics Eng., Center Informatics and Systems, Univ. of Coimbra, Portugal

  • Venue:
  • IBERAMIA '08 Proceedings of the 11th Ibero-American conference on AI: Advances in Artificial Intelligence
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The problem of overfitting arises frequently in text mining due to high dimensional feature spaces, making the task of the learning algorithms difficult. Moreover, in such spaces visualization is not feasible. We focus on supervised text classification by presenting an approach that uses prior information about training labels, manifold learning and Support Vector Machines (SVM). Manifold learning is herein used as a pre-processing step, which performs nonlinear dimension reduction in order to tackle the curse of dimensionality that occurs. We use Isomap (Isometric Mapping) which allows text to be embedded in a low dimensional space, while enhancing the geometric characteristics of data by preserving the geodesic distance within the manifold. Finally, kernel-based machines can be used with benefits for final text classification in this reduced space. Results on a real-world benchmark corpus from Reuters demonstrate the visualization capabilities of the method in the severely reduced space. Furthermore we show the method yields performances comparable to those obtained with single kernel-based machines.