Collocation Discovery for Optimal Bilingual Lexicon Development

  • Authors:
  • Scott McDonald;Davide Turcato;Paul McFetridge;Fred Popowich;Janine Toole

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • AI '00 Proceedings of the 13th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

The accurate translation of collocations, or multi-word units, is essential for high quality machine translation. However, many collocations do not translate compositionally, thus requiring individual entries in the bilingual lexicon. We present a technique for collocation extraction from large corpora that takes into account the dispersion of the collocations throughout the corpus. Collocations are ranked to more accurately reflect how likely they are to occur in a wide variety of texts; collocations which are specific to a particular text are less useful for lexicon development. Once the collocations are extracted, appropriate bilingual lexical entries can be developed by lexicographers.