Word association norms, mutual information, and lexicography
Computational Linguistics
Accurate methods for the statistics of surprise and coincidence
Computational Linguistics - Special issue on using large corpora: I
Retrieving collocations from text: Xtract
Computational Linguistics - Special issue on using large corpora: I
Distribution of content words and phrases in text and language modelling
Natural Language Engineering
Hi-index | 0.00 |
The accurate translation of collocations, or multi-word units, is essential for high quality machine translation. However, many collocations do not translate compositionally, thus requiring individual entries in the bilingual lexicon. We present a technique for collocation extraction from large corpora that takes into account the dispersion of the collocations throughout the corpus. Collocations are ranked to more accurately reflect how likely they are to occur in a wide variety of texts; collocations which are specific to a particular text are less useful for lexicon development. Once the collocations are extracted, appropriate bilingual lexical entries can be developed by lexicographers.