Measuring comparability of documents in non-parallel corpora for efficient extraction of (semi-)parallel translation equivalents

  • Authors:
  • Fangzhong Su;Bogdan Babych

  • Affiliations:
  • University Of Leeds, Leeds, UK;University Of Leeds, Leeds, UK

  • Venue:
  • EACL 2012 Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we present and evaluate three approaches to measure comparability of documents in non-parallel corpora. We develop a task-oriented definition of comparability, based on the performance of automatic extraction of translation equivalents from the documents aligned by the proposed metrics, which formalises intuitive definitions of comparability for machine translation research. We demonstrate application of our metrics for the task of automatic extraction of parallel and semiparallel translation equivalents and discuss how these resources can be used in the frameworks of statistical and rule-based machine translation.