Using parsed corpora for structural disambiguation in the TRAINS domain

  • Authors:
  • Mark Core

  • Affiliations:
  • University of Rochester, Rochester, New York

  • Venue:
  • ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
  • Year:
  • 1996

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a prototype disambiguation module, KANKEI, which was tested on two corpora of the TRAINS project. In ambiguous verb phrases of form V ... NP PP or V ... NP adverb(s), the two corpora have very different PP and adverb attachment patterns; in the first, the correct attachment is to the VP 88.7% of the time, while in the second, the correct attachment is to the NP 73.5% of the time. KANKEI uses various n-gram patterns of the phrase heads around these ambiguities, and assigns parse trees (with these ambiguities) a score based on a linear combination of the frequencies with which these patterns appear with NP and VP attachments in the TRAINS corpora. Unlike previous statistical disambiguation systems, this technique thus combines evidence from bigrams, trigrams, and the 4-gram around an ambiguous attachment. In the current experiments, equal weights are used for simplicity but results are still good on the TRAINS corpora (92.2% and 92.4% accuracy). Despite the large statistical differences in attachment preferences in the two corpora, training on the first corpus and testing on the second gives an accuracy of 90.9%.