A Technical Word- and Term-Translation Aid Using Noisy Parallel Corpora across Language Groups

  • Authors:
  • Pascale Fung;Kathleen McKeown

  • Affiliations:
  • Computer Science Department, Columbia University, New York, NY 10027, U.S.A., pascale@cs.columbia.edu;Computer Science Department,

  • Venue:
  • Machine Translation
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

Technical-term translation represents one of the most difficult tasks forhuman translators since (1) most translators are not familiar with terms anddomain-specific terminology and (2) such terms are not adequately coveredby printed dictionaries. This paper describes an algorithm for translatingtechnical words and terms from noisy parallel corpora across languagegroups. Given any word which is part of a technical term in the sourcelanguage, the algorithm produces a ranked candidate match for it in thetarget language. Potential translations for the term are compiled from the matched words and are also ranked. We show how this rankedlist helps translators in technical-term translation. Most algorithms for lexical and termtranslation focus on Indo-European language pairs, and most use asentence-aligned clean parallel corpus without insertion, deletion or OCRnoise. Our algorithm is language- and character-set-independent, and isrobust to noise in the corpus. We show how our algorithm requires minimumpreprocessing and is able to obtain technical-word translations withoutsentence-boundary identification or sentence alignment, from theEnglish–Japanese awk manual corpus with noise arising from text insertions or deletions and onthe English–Chinese HKUST bilingual corpus. Weobtain a precision of 55.35% from the awk corpus for word translationincluding rare words, counting only the best candidate and directtranslations. Translation precision of the best-candidate translation is 89.93% from the HKUST corpus. Potential term translations produced by the programhelp bilingual speakers to get a 47% improvement in translating technical terms.