EM-based hybrid model for bilingual terminology extraction from comparable corpora

  • Authors:
  • Lianhau Lee;Aiti Aw;Min Zhang;Haizhou Li

  • Affiliations:
  • Institute for Inforcomm Research;Institute for Inforcomm Research;Institute for Inforcomm Research;Institute for Inforcomm Research

  • Venue:
  • COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present an unsupervised hybrid model which combines statistical, lexical, linguistic, contextual, and temporal features in a generic EM-based framework to harvest bilingual terminology from comparable corpora through comparable document alignment constraint. The model is configurable for any language and is extensible for additional features. In overall, it produces considerable improvement in performance over the baseline method. On top of that, our model has shown promising capability to discover new bilingual terminology with limited usage of dictionaries.