Document representation in natural language text retrieval

Authors:
Tomek Strzalkowski
Affiliations:
New York University, New York, NY
Venue:
HLT '94 Proceedings of the workshop on Human Language Technology
Year:
1994

Citing 6
Cited 1

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Term clustering of syntactic phrases

SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Retrieval performance in Ferret a conceptual information retrieval system

SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval using robust natural language processing

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
TTP: a fast and robust parser for natural language

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 1
The importance of proper weighting methods

HLT '93 Proceedings of the workshop on Human Language Technology

Building a lexical domain map from text corpora

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

In information retrieval, the content of a document may be represented as a collection of terms: words, stems, phrases, or other units derived or inferred from the text of the document. These terms are usually weighted to indicate their importance within the document which can then be viewed as a vector in a N-dimensional space. In this paper we demonstrate that a proper term weighting is at least as important as their selection, and that different types of terms (e.g., words, phrases, names), and terms derived by different means (e.g., statistical, linguistic) must be treated differently for a maximum benefit in retrieval. We report some observations made during and after the second Text REtrieval Conference (TREC-2).