Phrase-Based Statistical Machine Translation
KI '02 Proceedings of the 25th Annual German Conference on AI: Advances in Artificial Intelligence
Improving SMT quality with morpho-syntactic analysis
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Empirical methods for compound splitting
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Extensions to HMM-based statistical word alignment models
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Improving word alignment quality using morpho-syntactic information
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Shared task: statistical machine translation between European languages
ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
German Compounds in Factored Statistical Machine Translation
GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
A comparison of merging strategies for translation of German compounds
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
Effects of morphological analysis in translation between German and English
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
The RWTH machine translation system for WMT 2009
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
The RWTH Aachen machine translation system for WMT 2010
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Preliminary study into query translation for patent retrieval
PaIR '10 Proceedings of the 3rd international workshop on Patent information retrieval
Pre- and postprocessing for statistical machine translation into Germanic languages
HLT-SS '11 Proceedings of the ACL 2011 Student Session
Recursive decompounding in Afrikaans
TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Productive generation of compound words in statistical machine translation
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
The RWTH Aachen machine translation system for WMT 2011
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Modeling inflection and word-formation in SMT
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Joint WMT 2012 submission of the QUAERO project
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Generation of compound words in statistical machine translation into compounding languages
Computational Linguistics
Hi-index | 0.00 |
German compound words pose special problems to statistical machine translation systems: the occurence of each of the components in the training data is not sufficient for successful translation. Even if the compound itself has been seen during training, the system may not be capable of translating it properly into two or more words. If German is the target language, the system might generate only separated components or may not be capable of choosing the correct compound. In this work, we investigate and compare different strategies for the treatment of German compound words in statistical machine translation systems. For translation from German, we compare linguistic-based and corpus-based compound splitting. For translation into German, we investigate splitting and rejoining German compounds, as well as joining English potential components. Additionaly, we investigate word alignments enhanced with knowledge about the splitting points of German compounds. The translation quality is consistently improved by all methods for both translation directions.