How Effective is Stemming and Decompounding for German Text Retrieval?
Information Retrieval
Improving SMT quality with morpho-syntactic analysis
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Empirical methods for compound splitting
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Minimum error rate training in statistical machine translation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
German Compounds in Factored Statistical Machine Translation
GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Moses: open source toolkit for statistical machine translation
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Using a maximum entropy model to build segmentation lattices for MT
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Meteor: an automatic metric for MT evaluation with high levels of correlation with human judgments
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Statistical machine translation of german compound words
FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Recursive decompounding in Afrikaans
TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Modeling inflection and word-formation in SMT
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Generation of compound words in statistical machine translation into compounding languages
Computational Linguistics
Hi-index | 0.00 |
Compound splitting is an important problem in many Nlp applications which must be solved in order to address issues of data sparsity. Previous work has shown that linguistic approaches for German compound splitting produce a correct splitting more often, but corpus-driven approaches work best for phrase-based statistical machine translation from German to English, a worrisome contradiction. We address this situation by combining linguistic analysis with corpus-driven statistics and obtaining better results in terms of both producing splittings according to a gold standard and statistical machine translation performance.