The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
SIGDOC '86 Proceedings of the 5th annual international conference on Systems documentation
Statistical models for unsupervised prepositional phrase attachment
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
An unsupervised approach to prepositional phrase attachment using contextually similar words
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
A maximum entropy model for prepositional phrase attachment
HLT '94 Proceedings of the workshop on Human Language Technology
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Hi-index | 0.00 |
In this paper we revisit the classical NLP problem of prepositional phrase attachment (PP-attachment). Given the pattern V -NP1-P-NP2 in the text, where V is verb, NP1 is a noun phrase, P is the preposition and NP2 is the other noun phrase, the question asked is where does P - NP2 attach: V or NP1? This question is typically answered using both the word and the world knowledge. Word Sense Disambiguation (WSD) and Data Sparsity Reduction (DSR) are the two requirements for PP-attachment resolution. Our approach described in this paper makes use of training data extracted from raw text, which makes it an unsupervised approach. The unambiguous V - P - N and N1 - P -N2 tuples of the training corpus TEACH the system how to resolve the attachments in the ambiguous V - N1 - P - N2 tuples of the test corpus. A graph based approach to word sense disambiguation (WSD) is used to obtain the accurate word knowledge. Further, the data sparsity problem is addressed by (i) detecting synonymy using the wordnet and (ii) doing a form of inferencing based on the matching of Vs and Ns in the unambiguous patterns of V - P - NP, NP1 - P - NP2. For experimentation, Brown Corpus provides the training data andWall Street Journal Corpus the test data. The accuracy obtained for PP-attachment resolution is close to 85%. The novelty of the system lies in the flexible use of WSD and DSR phases.