Experimental results of the signal processing approach to distributional clustering of terms on reuters-21578 collection

  • Authors:
  • Marta Capdevila Dalmau;Oscar W. Márquez Flórez

  • Affiliations:
  • University of Vigo, Telecommunication Engineering School, Signal and Communications Processing Dpt., Vigo, Spain;University of Vigo, Telecommunication Engineering School, Signal and Communications Processing Dpt., Vigo, Spain

  • Venue:
  • ECIR'07 Proceedings of the 29th European conference on IR research
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Distributional Clustering has showed to be an effective and powerful approach to supervised term extraction aimed at reducing the original indexing space dimensionality for Automatic Text Categorization [2]. In a recent paper [1] we introduced a new Signal Processing approach to Distributional Clustering which reached categorization results on 20 Newsgroups dataset similar to those obtained by other information-theoretic approaches [3][4][5]. Here we re-validate our method by showing that the 90-categories Reuters-21578 benchmark collection can be indexed with a minimum loss of categorization accuracy (around 2% with Naïve Bayes categorizer) with only 50 clusters.