Evaluating the Utility of Statistical Phrases and Latent Semantic Indexing for Text Classification

Authors:
Huiwen Wu;Dimitrios Gunopulos
Affiliations:
-;-
Venue:
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Year:
2002

Citing 0
Cited 3

Overcoming the brittleness bottleneck using wikipedia: enhancing text categorization with encyclopedic knowledge

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Wikipedia-based semantic interpretation for natural language processing

Journal of Artificial Intelligence Research
Hybrid singular value decomposition: a model of human text classification

Proceedings of the 2007 conference on Human interface: Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

The term-based vector space model is a prominenttechnique to retrieve textual information. In this paper weexamine the usefulness of phrases as terms in vector-baseddocument classification. We focus on statistical techniquesto extract both adjacent and window phrases fromdocuments. We discover that the positive effect of addingphrase terms is very limited, if we have already achievedgood performance using single-word terms, even whenSVD/LSI is used as dimensionality reduction method.