Use of syntactic context to produce term association lists for text retrieval

  • Authors:
  • Gregory Grefenstette

  • Affiliations:
  • Computer Science Department, University of Pittsburgh, Pittsburgh, PA

  • Venue:
  • SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 1992

Quantified Score

Hi-index 0.00

Visualization

Abstract

One aspect of world knowledge essential to information retrieval is knowing when two words are related. Knowing word relatedness allows a system given a user's query terms to retrieve relevant documents not containing those exact terms. Two words can be said to be related if they appear in the same contexts Document co-occurrence gives a measure of word relatedness that has proved to be too rough to be useful. The relatively recent apparition of on-line dictionaries and robust and rapid parsers permits the extraction of finer word contexts from large corpora. In this paper, we will describe such an extraction technique that uses only coarse syntactic analysis and no domain knowledge. This technique produces lists of words related to any work appearing in a corpus. When the closest related terms were used in query expansion of a standard information retrieval testbed, the results were much better than that given by document co-occurence techniques, and slightly better than using unexpanded queries, supporting the contention that semantically similar words were indeed extracted by this technique.