A comparison of merging strategies for translation of German compounds

Authors:
Sara Stymne
Affiliations:
Linköping University, Sweden
Venue:
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
Year:
2009

Citing 10
Cited 8

A systematic comparison of various statistical alignment models

Computational Linguistics
Empirical methods for compound splitting

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Decompounding query keywords from compounding languages

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Decomposability of translation metrics for improved evaluation and efficient algorithms

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Towards better machine translation quality for the German--English language pairs

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Initial explorations in English to Turkish statistical machine translation

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Statistical machine translation of german compound words

FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing

Improving alignment for SMT by reordering and augmenting the training corpus

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
LIMSI's statistical translation systems for WMT'10

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Pre- and postprocessing for statistical machine translation into Germanic languages

HLT-SS '11 Proceedings of the ACL 2011 Student Session
Productive generation of compound words in statistical machine translation

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Shallow semantic trees for SMT

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Experiments with word alignment, normalization and clause reordering for SMT between English and German

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Hierarchical Bayesian language modelling for the linguistically informed

EACL '12 Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Generation of compound words in statistical machine translation into compounding languages

Computational Linguistics

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this article, compound processing for translation into German in a factored statistical MT system is investigated. Compounds are handled by splitting them prior to training, and merging the parts after translation. I have explored eight merging strategies using different combinations of external knowledge sources, such as word lists, and internal sources that are carried through the translation process, such as symbols or parts-of-speech. I show that for merging to be successful, some internal knowledge source is needed. I also show that an extra sequence model for part-of-speech is useful in order to improve the order of compound parts in the output. The best merging results are achieved by a matching scheme for part-of-speech tags.