Language models for machine translation: original vs. translated texts

Authors:
Gennadi Lembersky;Noam Ordan;Shuly Wintner
Affiliations:
University of Haifa, Haifa, Israel;University of Haifa, Haifa, Israel;University of Haifa, Haifa, Israel
Venue:
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Year:
2011

Citing 11
Cited 6

BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Enriching the knowledge sources used in a maximum entropy part-of-speech tagger

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Speech and Language Processing (2nd Edition)

Speech and Language Processing (2nd Edition)
Incorporating non-local information into information extraction systems by Gibbs sampling

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Source language markers in EUROPARL translations

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Distributed language models

NAACL-Tutorials '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Tutorial Abstracts
A Maximum Likelihood Approach to Continuous Speech Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Identification of translationese: a machine learning approach

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing

Searching for poor quality machine translated text: learning the difference between human writing and machine translations

Canadian AI'12 Proceedings of the 25th Canadian conference on Advances in Artificial Intelligence
Adapting translation models to translationese improves SMT

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Extraction of multi-word expressions from small parallel corpora

Natural Language Engineering
Language models for machine translation: Original vs. translated texts

Computational Linguistics
Improving statistical machine translation by adapting translation models to translationese

Computational Linguistics
Improving statistical machine translation by adapting translation models to translationese

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We investigate the differences between language models compiled from original target-language texts and those compiled from texts manually translated to the target language. Corroborating established observations of Translation Studies, we demonstrate that the latter are significantly better predictors of translated sentences than the former, and hence fit the reference set better. Furthermore, translated texts yield better language models for statistical machine translation than original texts.