Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Empirical methods for compound splitting
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Minimum error rate training in statistical machine translation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Translating with non-contiguous phrases
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics
HLT '02 Proceedings of the second international conference on Human Language Technology Research
German Compounds in Factored Statistical Machine Translation
GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Moses: open source toolkit for statistical machine translation
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
A comparison of merging strategies for translation of German compounds
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
Experiments in morphosyntactic processing for translating to and from German
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Initial explorations in English to Turkish statistical machine translation
StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Statistical machine translation of german compound words
FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
A formal model of ambiguity and its applications in machine translation
A formal model of ambiguity and its applications in machine translation
Modeling inflection and word-formation in SMT
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Generation of compound words in statistical machine translation into compounding languages
Computational Linguistics
Hi-index | 0.00 |
In many languages the use of compound words is very productive. A common practice to reduce sparsity consists in splitting compounds in the training data. When this is done, the system incurs the risk of translating components in non-consecutive positions, or in the wrong order. Furthermore, a post-processing step of compound merging is required to reconstruct compound words in the output. We present a method for increasing the chances that components that should be merged are translated into contiguous positions and in the right order. We also propose new heuristic methods for merging components that outperform all known methods, and a learning-based method that has similar accuracy as the heuristic method, is better at producing novel compounds, and can operate with no background linguistic resources.