Source language adaptation for resource-poor machine translation

Authors:
Pidong Wang;Preslav Nakov;Hwee Tou Ng
Affiliations:
National University of Singapore, Singapore;QCRI, Doha, Qatar;National University of Singapore, Singapore
Venue:
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Year:
2012

Citing 15
Cited 0

Learning String-Edit Distance

IEEE Transactions on Pattern Analysis and Machine Intelligence
Machine translation of very close languages

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Dialect MT: a case study between Cantonese and Mandarin

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Clause restructuring for statistical machine translation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Improved statistical machine translation using paraphrases

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
A phrase-based statistical model for SMS text normalization

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
CCG supertags in factored statistical machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Revisiting pivot language approach for machine translation

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Improved statistical machine translation for resource-poor languages using related resource-rich languages

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Lexical normalisation of short text messages: makn sens a #twitter

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Translating from morphologically complex languages: a paraphrase-based approach

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Dialectal to standard Arabic paraphrasing to improve Arabic-English statistical machine translation

DIALECTS '11 Proceedings of the First Workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties
Improving statistical machine translation for a resource-poor language using related resource-rich languages

Journal of Artificial Intelligence Research
Combining word-level and character-level models for machine translation between closely-related languages

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a novel, language-independent approach for improving machine translation from a resource-poor language to X by adapting a large bi-text for a related resource-rich language and X (the same target language). We assume a small bi-text for the resource-poor language to X pair, which we use to learn word-level and phrase-level paraphrases and cross-lingual morphological variants between the resource-rich and the resource-poor language; we then adapt the former to get closer to the latter. Our experiments for Indonesian/Malay--English translation show that using the large adapted resource-rich bi-text yields 6.7 BLEU points of improvement over the unadapted one and 2.6 BLEU points over the original small bi-text. Moreover, combining the small bi-text with the adapted bi-text outperforms the corresponding combinations with the unadapted bi-text by 1.5--3 BLEU points. We also demonstrate applicability to other languages and domains.