Corpus-based identification of non-anaphoric noun phrases

  • Authors:
  • David L. Bean;Ellen Riloff

  • Affiliations:
  • University of Utah, Salt Lake City, Utah;University of Utah, Salt Lake City, Utah

  • Venue:
  • ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

Coreference resolution involves finding antecedents for anaphoric discourse entities, such as definite noun phrases. But many definite noun phrases are not anaphoric because their meaning can be understood from general world knowledge (e.g., "the White House" or "the news media"). We have developed a corpus-based algorithm for automatically identifying definite noun phrases that are non-anaphoric, which has the potential to improve the efficiency and accuracy of coreference resolution systems. Our algorithm generates lists of non-anaphoric noun phrases and noun phrase patterns from a training corpus and uses them to recognize non-anaphoric noun phrases in new texts. Using 1600 MUC-4 terrorism news articles as the training corpus, our approach achieved 78% recall and 87% precision at identifying such noun phrases in 50 text documents.