Translating collocations for bilingual lexicons: a statistical approach

  • Authors:
  • Frank Smadja;Kathleen R. McKeown;Vasileios Hatzivassiloglou

  • Affiliations:
  • NetPatrol Consulting;Columbia University;Columbia University

  • Venue:
  • Computational Linguistics
  • Year:
  • 1996

Quantified Score

Hi-index 0.00

Visualization

Abstract

Collocations are notoriously difficult for non-native speakers to translate, primarily because they are opaque and cannot be translated on a word-by-word basis. We describe a program named Champollion which, given a pair of parallel corpora in two different languages and a list of collocations in one of them, automatically produces their translations. Our goal is to provide a tool for compiling bilingual lexical information above the word level in multiple languages, for different domains. The algorithm we use is based on statistical methods and produces p-word translations of n-word collocations in which n and p need not be the same. For example, Champollion translates make...decision, employment equity, and stock market into prendre...décision, équité en matière d'emploi, and bourse respectively. Testing Champollion on three years' worth of the Hansards corpus yielded the French translations of 300 collocations for each year, evaluated at 73% accuracy on average. In this paper, we describe the statistical measures used, the algorithm, and the implementation of Champollion, presenting our results and evaluation.