Automatic acquisition of hyponyms from large text corpora

  • Authors:
  • Marti A. Hearst

  • Affiliations:
  • University of California, Berkeley Berkeley, CA

  • Venue:
  • COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
  • Year:
  • 1992

Quantified Score

Hi-index 0.01

Visualization

Abstract

We describe a method for the automatic acquisition of the hyponymy lexical relation from unrestricted text. Two goals motivate the approach: (i) avoidance of the need for pre-encoded knowledge and (ii) applicability across a wide range of text. We identify a set of lexico-syntactic patterns that are easily recognizable, that occur frequently and across text genre boundaries, and that indisputably indicate the lexical relation of interest. We describe a method for discovering these patterns and suggest that other lexical relations will also be acquirable in this way. A subset of the acquisition algorithm is implemented and the results are used to augment and critique the structure of a large hand-built thesaurus. Extensions and applications to areas such as information retrieval are suggested.