Empirical methods for compound splitting
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Decompounding query keywords from compounding languages
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Moses: open source toolkit for statistical machine translation
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Using a maximum entropy model to build segmentation lattices for MT
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
German decompounding in a difficult corpus
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Algorithms for the verification of the semantic relation between a compound and a given lexeme
Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies
A class-based agreement model for generating accurately inflected translations
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Machine translation without words through substring alignment
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Segmenting web-domains and hashtags using length specific models
Proceedings of the 21st ACM international conference on Information and knowledge management
Substring-based machine translation
Machine Translation
Generation of compound words in statistical machine translation into compounding languages
Computational Linguistics
Hi-index | 0.00 |
Translating compounds is an important problem in machine translation. Since many compounds have not been observed during training, they pose a challenge for translation systems. Previous decompounding methods have often been restricted to a small set of languages as they cannot deal with more complex compound forming processes. We present a novel and unsupervised method to learn the compound parts and morphological operations needed to split compounds into their compound parts. The method uses a bilingual corpus to learn the morphological operations required to split a compound into its parts. Furthermore, monolingual corpora are used to learn and filter the set of compound part candidates. We evaluate our method within a machine translation task and show significant improvements for various languages to show the versatility of the approach.