Language-independent compound splitting with morphological operations

Authors:
Klaus Macherey;Andrew M. Dai;David Talbot;Ashok C. Popat;Franz Och
Affiliations:
Google Inc., Mountain View, CA;University of Edinburgh, Edinburgh, UK;Google Inc., Mountain View, CA;Google Inc., Mountain View, CA;Google Inc., Mountain View, CA
Venue:
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Year:
2011

Citing 6
Cited 6

Empirical methods for compound splitting

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Decompounding query keywords from compounding languages

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Using a maximum entropy model to build segmentation lattices for MT

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
German decompounding in a difficult corpus

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing

Algorithms for the verification of the semantic relation between a compound and a given lexeme

Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies
A class-based agreement model for generating accurately inflected translations

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Machine translation without words through substring alignment

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Segmenting web-domains and hashtags using length specific models

Proceedings of the 21st ACM international conference on Information and knowledge management
Substring-based machine translation

Machine Translation
Generation of compound words in statistical machine translation into compounding languages

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Translating compounds is an important problem in machine translation. Since many compounds have not been observed during training, they pose a challenge for translation systems. Previous decompounding methods have often been restricted to a small set of languages as they cannot deal with more complex compound forming processes. We present a novel and unsupervised method to learn the compound parts and morphological operations needed to split compounds into their compound parts. The method uses a bilingual corpus to learn the morphological operations required to split a compound into its parts. Furthermore, monolingual corpora are used to learn and filter the set of compound part candidates. We evaluate our method within a machine translation task and show significant improvements for various languages to show the versatility of the approach.