Adaptive Parallel Sentences Mining from Web Bilingual News Collection
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Scaling to very very large corpora for natural language disambiguation
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Extracting parallel sub-sentential fragments from non-parallel corpora
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Learning accurate, compact, and interpretable tree annotation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Online large-margin training of syntactic and structural translation features
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Large scale parallel document mining for machine translation
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Feature-rich language-independent syntax-based alignment for statistical machine translation
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Hi-index | 0.00 |
We present a novel method to detect parallel fragments within noisy parallel corpora. Isolating these parallel fragments from the noisy data in which they are contained frees us from noisy alignments and stray links that can severely constrain translation-rule extraction. We do this with existing machinery, making use of an existing word alignment model for this task. We evaluate the quality and utility of the extracted data on large-scale Chinese-English and Arabic-English translation tasks and show significant improvements over a state-of-the-art baseline.