Using parsed corpora for structural disambiguation in the TRAINS domain

Authors:
Mark Core
Affiliations:
University of Rochester, Rochester, New York
Venue:
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Year:
1996

Citing 3
Cited 0

Training and scaling preference functions for disambiguation

Computational Linguistics
Structural ambiguity and lexical relations

Computational Linguistics - Special issue on using large corpora: I
A rule-based approach to prepositional phrase attachment disambiguation

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a prototype disambiguation module, KANKEI, which was tested on two corpora of the TRAINS project. In ambiguous verb phrases of form V ... NP PP or V ... NP adverb(s), the two corpora have very different PP and adverb attachment patterns; in the first, the correct attachment is to the VP 88.7% of the time, while in the second, the correct attachment is to the NP 73.5% of the time. KANKEI uses various n-gram patterns of the phrase heads around these ambiguities, and assigns parse trees (with these ambiguities) a score based on a linear combination of the frequencies with which these patterns appear with NP and VP attachments in the TRAINS corpora. Unlike previous statistical disambiguation systems, this technique thus combines evidence from bigrams, trigrams, and the 4-gram around an ambiguous attachment. In the current experiments, equal weights are used for simplicity but results are still good on the TRAINS corpora (92.2% and 92.4% accuracy). Despite the large statistical differences in attachment preferences in the two corpora, training on the first corpus and testing on the second gives an accuracy of 90.9%.