Exploring different representational units in English-to-Turkish statistical machine translation

Authors:
Kemal Oflazer;Ilknur Durgar El-Kahlout
Affiliations:
Carnegie Mellon University, Pittsburgh, PA and Sabancl University, Istanbul, Tuzla, Turkey;Sabancl University, Istanbul, Tuzla, Turkey
Venue:
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Year:
2007

Citing 12
Cited 10

BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Statistical Machine Translation with Scarce Resources Using Morpho-syntactic Information

Computational Linguistics
Modelling lexical redundancy for machine translation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Improving statistical MT through morphological analysis

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Learning morphological disambiguation rules for Turkish

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Morphological analysis for statistical machine translation

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Bridging the inflection morphology gap for Arabic statistical machine translation

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Rule-based translation with statistical phrase-based post-editing

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Initial explorations in English to Turkish statistical machine translation

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
N-gram posterior probabilities for statistical machine translation

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation

Using a maximum entropy model to build segmentation lattices for MT

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Statistical machine translation into a morphologically complex language

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Syntax-to-morphology mapping in factored phrase-based statistical machine translation from English to Turkish

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Unsupervised search for the optimal segmentation for statistical machine translation

ACLstudent '10 Proceedings of the ACL 2010 Student Research Workshop
Exploiting morphology and local word reordering in English-to-Turkish phrase-based statistical machine translation

IEEE Transactions on Audio, Speech, and Language Processing
A hybrid morpheme-word representation for machine translation of morphologically rich languages

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
EMMA: a novel Evaluation Metric for Morphological Analysis

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Simulating morphological analyzers with stochastic taggers for confidence estimation

CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Crisis MT: developing a cookbook for MT in crisis situations

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Orthographic and morphological processing for English---Arabic statistical machine translation

Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We investigate different representational granularities for sub-lexical representation in statistical machine translation work from English to Turkish. We find that (i) representing both Turkish and English at the morpheme-level but with some selective morpheme-grouping on the Turkish side of the training data, (ii) augmenting the training data with "sentences" comprising only the content words of the original training data to bias root word alignment, (iii) reranking the n-best morpheme-sequence outputs of the decoder with a word-based language model, and (iv) using model iteration all provide a non-trivial improvement over a fully word-based baseline. Despite our very limited training data, we improve from 20.22 BLEU points for our simplest model to 25.08 BLEU points for an improvement of 4.86 points or 24% relative.