Heuristics-Based Replenishment of Collocation Databases

  • Authors:
  • Igor A. Bolshakov;Alexander F. Gelbukh

  • Affiliations:
  • -;-

  • Venue:
  • PorTAL '02 Proceedings of the Third International Conference on Advances in Natural Language Processing
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Collections are defined as syntactically linked and semantically plausible combinations of content words. Since collections constitute a bulk of common texts and depend on nthe language, creation of such databases is prohibitively expensive. We present heuristics for automatic generation of new Spanish collocations based on those already present in a CBD, with the help of WordNet-like thesaurus: If a word A is semantically "similar" to a word B and a collocation B + C is known, then A + C presumable is a collocation of the same type given certain conditions are met.