Bilingual lexicon extraction from comparable corpora enhanced with parallel corpora

  • Authors:
  • Emmanuel Morin;Emmanuel Prochasson

  • Affiliations:
  • Université de Nantes, LINA - UMR CNRS, BP, Nantes Cedex;Université de Nantes, LINA - UMR CNRS, BP, Nantes Cedex

  • Venue:
  • BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this article, we present a simple and effective approach for extracting bilingual lexicon from comparable corpora enhanced with parallel corpora. We make use of structural characteristics of the documents comprising the comparable corpus to extract parallel sentences with a high degree of quality. We then use state-of-the-art techniques to build a specialized bilingual lexicon from these sentences and evaluate the contribution of this lexicon when added to the comparable corpus-based alignment technique. Finally, the value of this approach is demonstrated by the improvement of translation accuracy for medical words.