A systematic comparison of various statistical alignment models
Computational Linguistics
Automatic construction of English/Chinese parallel corpora
Journal of the American Society for Information Science and Technology
Computational Linguistics - Special issue on web as corpus
A program for aligning sentences in bilingual corpora
Computational Linguistics - Special issue on using large corpora: I
Identifying word translations in non-parallel texts
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Discriminative training and maximum entropy models for statistical machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Word sense acquisition from bilingual comparable corpora
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Statistical phrase-based translation
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Reliable measures for aligning Japanese-English news articles and sentences
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Improving Machine Translation Performance by Exploiting Non-Parallel Corpora
Computational Linguistics
Named entity transliteration with comparable corpora
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Moses: open source toolkit for statistical machine translation
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
On the use of comparable corpora to improve SMT performance
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Language and translation model adaptation using comparable corpora
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Fluency, adequacy, or HTER?: exploring different human judgments with a tunable MT metric
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
An empirical study on web mining of parallel data
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Parallel sentence generation from comparable corpora for improved SMT
Machine Translation
Mining a Persian-English comparable corpus for cross-language information retrieval
Information Processing and Management: an International Journal
Hi-index | 0.00 |
In this paper we present an extension of a successful simple and effective method for extracting parallel sentences from comparable corpora and we apply it to an Arabic/English NIST system. We experiment with a new TERp filter, along with WER and TER filters. We also report a comparison of our approach with that of (Munteanu and Marcu, 2005) using exactly the same corpora and show performance gain by using much lesser data. Our approach employs an SMT system built from small amounts of parallel texts to translate the source side of the non-parallel corpus. The target side texts are used, along with other corpora, in the language model of this SMT system. We then use information retrieval techniques and simple filters to create parallel data from a comparable news corpora. We evaluate the quality of the extracted data by showing that it significantly improves the performance of an SMT systems.