Investigating unsupervised learning for text categorization bootstrapping

  • Authors:
  • Alfio Gliozzo;Carlo Strapparava;Ido Dagan

  • Affiliations:
  • Istituto per la Ricerca Scientifica e Tecnologica, Trento, Italy;Istituto per la Ricerca Scientifica e Tecnologica, Trento, Italy;Bar Ilan University, Ramat Gan, Israel

  • Venue:
  • HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a generalized bootstrapping algorithm in which categories are described by relevant seed features. Our method introduces two unsupervised steps that improve the initial categorization step of the bootstrapping scheme: (i) using Latent Semantic space to obtain a generalized similarity measure between instances and features, and (ii) the Gaussian Mixture algorithm, to obtain uniform classification probabilities for unlabeled examples. The algorithm was evaluated on two Text Categorization tasks and obtained state-of-the-art performance using only the category names as initial seeds.