Language-independent bilingual terminology extraction from a multilingual parallel corpus

  • Authors:
  • Els Lefever;Lieve Macken;Veronique Hoste

  • Affiliations:
  • University College Ghent, Gent, Belgium and Ghent University, Gent, Belgium;University College Ghent, Gent, Belgium and Ghent University, Gent, Belgium;University College Ghent, Gent, Belgium and Ghent University, Gent, Belgium

  • Venue:
  • EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

We present a language-pair independent terminology extraction module that is based on a sub-sentential alignment system that links linguistically motivated phrases in parallel texts. Statistical filters are applied on the bilingual list of candidate terms that is extracted from the alignment output. We compare the performance of both the alignment and terminology extraction module for three different language pairs (French-English, French-Italian and French-Dutch) and highlight language-pair specific problems (e.g. different compounding strategy in French and Dutch). Comparisons with standard terminology extraction programs show an improvement of up to 20% for bilingual terminology extraction and competitive results (85% to 90% accuracy) for monolingual terminology extraction, and reveal that the linguistically based alignment module is particularly well suited for the extraction of complex multiword terms.