Improving statistical machine translation for a resource-poor language using related resource-rich languages

Authors:
Preslav Nakov;Hwee Tou Ng
Affiliations:
Qatar Computing Research Institute, Qatar Foundation, Doha, Qatar;Department of Computer Science, National University of Singapore, Singapore
Venue:
Journal of Artificial Intelligence Research
Year:
2012

Citing 55
Cited 2

Learning String-Edit Distance

IEEE Transactions on Pattern Analysis and Machine Intelligence
Models of translational equivalence among words

Computational Linguistics
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Bitext maps and alignment via pattern recognition

Computational Linguistics
Machine translation of very close languages

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Dialect MT: a case study between Cantonese and Mandarin

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
HMM-based word alignment in statistical translation

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Multipath translation lexicon induction via bridge languages

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Cognates can improve statistical translation models

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
An improved error model for noisy channel spelling correction

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
But dictionaries are data too

HLT '93 Proceedings of the workshop on Human Language Technology
Learning a translation lexicon from monolingual corpora

ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9
Inducing translation lexicons via diverse similarity measures and bridge languages

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
A hierarchical phrase-based model for statistical machine translation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Dependency treelet translation: syntactically informed phrasal SMT

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Clause restructuring for statistical machine translation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Paraphrasing with bilingual parallel corpora

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Estimating class priors in domain adaptation for word sense disambiguation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Improved statistical machine translation using paraphrases

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Named entity transliteration and discovery from multilingual comparable corpora

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
A two-stage approach to domain adaptation for statistical classifiers

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Semi-supervised model adaptation for statistical machine translation

Machine Translation
Pivot language approach for phrase-based statistical machine translation

Machine Translation
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Adaptive string distance measures for bilingual dialect lexicon induction

ACL '07 Proceedings of the 45th Annual Meeting of the ACL: Student Research Workshop
Improved Statistical Machine Translation Using Monolingual Paraphrases

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Improving translation lexicon induction from monolingual corpora via dependency contexts and part-of-speech equivalences

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Induction of cross-language affix and letter sequence correspondence

CrossLangInduction '06 Proceedings of the International Workshop on Cross-Language Knowledge Induction
Tagging Portuguese with a Spanish tagger using cognates

CrossLangInduction '06 Proceedings of the International Workshop on Cross-Language Knowledge Induction
Word lattices for multi-source translation

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Syntactic constraints on paraphrases extracted from parallel corpora

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Language and translation model adaptation using comparable corpora

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
11,001 new features for statistical machine translation

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
On the importance of pivot language selection for statistical machine translation

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Domain adaptation for statistical classifiers

Journal of Artificial Intelligence Research
A comparison of different machine transliteration models

Journal of Artificial Intelligence Research
CCG supertags in factored statistical machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Can we translate letters?

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
NUS at WMT09: domain adaptation experiments for English-Spanish machine translation of news commentary text

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Improving Arabic-Chinese statistical machine translation using English as pivot language

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Word sense disambiguation with distribution estimation

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Context-based approach for pivot translation services

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Improved statistical machine translation for resource-poor languages using related resource-rich languages

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Domain adaptation for semantic role labeling in the biomedical domain

Bioinformatics
METEOR-NEXT and the METEOR paraphrase tables: improved evaluation support for five target languages

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Proceedings of the 2010 Named Entities Workshop

NEWS '10 Proceedings of the 2010 Named Entities Workshop
Local lexical adaptation in machine translation through triangulation: SMT helping SMT

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Learning tractable word alignment models with complex constraints

Computational Linguistics
Lexical normalisation of short text messages: makn sens a #twitter

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Domain adaptation for machine translation by mining unseen words

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Dialectal to standard Arabic paraphrasing to improve Arabic-English statistical machine translation

DIALECTS '11 Proceedings of the First Workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties
Tuning as ranking

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Combining word-level and character-level models for machine translation between closely-related languages

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Source language adaptation for resource-poor machine translation

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a novel language-independent approach for improving machine translation for resource-poor languages by exploiting their similarity to resource-rich ones. More precisely, we improve the translation from a resource-poor source language X1 into a resourcerich language Y given a bi-text containing a limited number of parallel sentences for X1-Y and a larger bi-text for X2-Y for some resource-rich language X2 that is closely related to X1. This is achieved by taking advantage of the opportunities that vocabulary overlap and similarities between the languages X1 and X2 in spelling, word order, and syntax offer: (1) we improve the word alignments for the resource-poor language, (2) we further augment it with additional translation options, and (3) we take care of potential spelling differences through appropriate transliteration. The evaluation for Indonesian → English using Malay and for Spanish → English using Portuguese and pretending Spanish is resource-poor shows an absolute gain of up to 1.35 and 3.37 BLEU points, respectively, which is an improvement over the best rivaling approaches, while using much less additional data. Overall, our method cuts the amount of necessary "real" training data by a factor of 2-5.