Distributional thesaurus versus wordnet: a comparison of backoff techniques for unsupervised PP attachment

  • Authors:
  • Hiram Calvo;Alexander Gelbukh;Adam Kilgarriff

  • Affiliations:
  • Center for Computing Research, National Polytechnic Institute, Mexico;Center for Computing Research, National Polytechnic Institute, Mexico;Lexical Computing Ltd., United Kingdom

  • Venue:
  • CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Prepositional Phrase (PP) attachment can be addressed by considering frequency counts of dependency triples seen in a non-annotated corpus. However, not all triples appear even in very big corpora. To solve this problem, several techniques have been used. We evaluate two different backoff methods, one based on WordNet and the other on a distributional (automatically created) thesaurus. We work on Spanish. The thesaurus is created using the dependency triples found in the same corpus used for counting the frequency of unambiguous triples. The training corpus used for both methods is an encyclopaedia. The method based on a distributional thesaurus has higher coverage but lower precision than the WordNet method.