Adaptive Parallel Sentences Mining from Web Bilingual News Collection

  • Authors:
  • Bing Zhao;Stephan Vogel

  • Affiliations:
  • -;-

  • Venue:
  • ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper a robust, adaptive approach for miningparallel sentences from a bilingual comparable newscollection is described. Sentence length models andlexicon-based models are combined under a maximumlikelihood criterion. Specific models are proposed to handleinsertions and deletions that are frequent in bilingualdata collected from the web. The proposed approach isadaptive, updating the translation lexicon iteratively usingthe mined parallel data to get better vocabulary coverageand translation probability parameter estimation.Experiments are carried out on 10 years of Xinhuabilingual news collection. Using the mined data, we getsignificant improvement in word-to-word alignment accuracyin machine translation modeling.