How to avoid burning ducks: combining linguistic analysis and corpus statistics for German compound processing

Authors:
Fabienne Fritzinger;Alexander Fraser
Affiliations:
University of Stuttgart;University of Stuttgart
Venue:
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Year:
2010

Citing 10
Cited 3

How Effective is Stemming and Decompounding for German Text Retrieval?

Information Retrieval
Improving SMT quality with morpho-syntactic analysis

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Empirical methods for compound splitting

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
German Compounds in Factored Statistical Machine Translation

GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Using a maximum entropy model to build segmentation lattices for MT

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Meteor: an automatic metric for MT evaluation with high levels of correlation with human judgments

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Statistical machine translation of german compound words

FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing

Recursive decompounding in Afrikaans

TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Modeling inflection and word-formation in SMT

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Generation of compound words in statistical machine translation into compounding languages

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Compound splitting is an important problem in many Nlp applications which must be solved in order to address issues of data sparsity. Previous work has shown that linguistic approaches for German compound splitting produce a correct splitting more often, but corpus-driven approaches work best for phrase-based statistical machine translation from German to English, a worrisome contradiction. We address this situation by combining linguistic analysis with corpus-driven statistics and obtaining better results in terms of both producing splittings according to a gold standard and statistical machine translation performance.