A hybrid morpheme-word representation for machine translation of morphologically rich languages

Authors:
Minh-Thang Luong;Preslav Nakov;Min-Yen Kan
Affiliations:
National University of Singapore, Singapore;National University of Singapore, Singapore;National University of Singapore, Singapore
Venue:
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Year:
2010

Citing 21
Cited 6

Noun phrase translation

Noun phrase translation
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
The Alignment Template Approach to Statistical Machine Translation

Computational Linguistics
Unsupervised models for morpheme segmentation and morphology learning

ACM Transactions on Speech and Language Processing (TSLP)
Clause restructuring for statistical machine translation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Combination of Arabic preprocessing schemes for statistical machine translation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Improving statistical MT through morphological analysis

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Pivot language approach for phrase-based statistical machine translation

Machine Translation
Segmentation for English-to-Arabic statistical machine translation

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Morphological analysis for statistical machine translation

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Exploring different representational units in English-to-Turkish statistical machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Findings of the 2009 workshop on statistical machine translation

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Combining multi-engine translations with Moses

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
The Universität Karlsruhe translation system for the EACL-WMT 2009

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Edinburgh's submission to all tracks of the WMT2009 shared task with reordering and speed improvements to Moses

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Mining a comparable text corpus for a Vietnamese - French statistical machine translation system

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Shared task: statistical machine translation between European languages

ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
A comparative study of hypothesis alignment and its improvement for machine translation system combination

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Improved statistical machine translation for resource-poor languages using related resource-rich languages

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3

Combining morpheme-based machine translation with post-processing morpheme prediction

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Morphemes and POS tags for n-gram based evaluation metrics

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Agreement constraints for statistical machine translation into German

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Character-based pivot translation for under-resourced languages and domains

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Modeling inflection and word-formation in SMT

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Unsupervised morphology rivals supervised morphology for Arabic MT

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a language-independent approach for improving statistical machine translation for morphologically rich languages using a hybrid morpheme-word representation where the basic unit of translation is the morpheme, but word boundaries are respected at all stages of the translation process. Our model extends the classic phrase-based model by means of (1) word boundary-aware morpheme-level phrase extraction, (2) minimum error-rate training for a morpheme-level translation model using word-level BLEU, and (3) joint scoring with morpheme- and word-level language models. Further improvements are achieved by combining our model with the classic one. The evaluation on English to Finnish using Europarl (714K sentence pairs; 15.5M English words) shows statistically significant improvements over the classic model based on BLEU and human judgments.