Evaluating the Utility of Statistical Phrases and Latent Semantic Indexing for Text Classification

  • Authors:
  • Huiwen Wu;Dimitrios Gunopulos

  • Affiliations:
  • -;-

  • Venue:
  • ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

The term-based vector space model is a prominenttechnique to retrieve textual information. In this paper weexamine the usefulness of phrases as terms in vector-baseddocument classification. We focus on statistical techniquesto extract both adjacent and window phrases fromdocuments. We discover that the positive effect of addingphrase terms is very limited, if we have already achievedgood performance using single-word terms, even whenSVD/LSI is used as dimensionality reduction method.