Automatic processing of large corpora for the resolution of anaphora references

  • Authors:
  • Ido Dagan;Alon Itai

  • Affiliations:
  • Technion, Haifa, Israel;Technion, Haifa, Israel

  • Venue:
  • COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 3
  • Year:
  • 1990

Quantified Score

Hi-index 0.00

Visualization

Abstract

Manual acquisition of semantic constraints in broad domains is very expensive. This paper presents an automatic scheme for collecting statistics on cooccurrence patterns in a large corpus. To a large extent, these statistics reflect semantic constraints and thus are used to disambiguate anaphora references and syntactic ambiguities. The scheme was implemented by gathering statistics on the output of other linguistic tools. An experiment was performed to resolve references of the pronoun "it" in sentences that were randomly selected from the corpus. The results of the experiment show that in most of the cases the cooccurrence statistics indeed reflect the semantic constraints and thus provide a basis for a useful disambiguation tool.