Clustering comparable corpora for bilingual lexicon extraction

  • Authors:
  • Bo Li;Eric Gaussier;Akiko Aizawa

  • Affiliations:
  • UJF-Grenoble/CNRS, France, LIG UMR;UJF-Grenoble/CNRS, France, LIG UMR;National Institute of Informatics, Tokyo, Japan

  • Venue:
  • HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We study in this paper the problem of enhancing the comparability of bilingual corpora in order to improve the quality of bilingual lexicons extracted from comparable corpora. We introduce a clustering-based approach for enhancing corpus comparability which exploits the homogeneity feature of the corpus, and finally preserves most of the vocabulary of the original corpus. Our experiments illustrate the well-foundedness of this method and show that the bilingual lexicons obtained from the homogeneous corpus are of better quality than the lexicons obtained with previous approaches.