Productive generation of compound words in statistical machine translation

Authors:
Sara Stymne;Nicola Cancedda
Affiliations:
Linköping University, Linköping, Sweden;Xerox Research Centre Europe, Meylan, France
Venue:
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Year:
2011

Citing 14
Cited 2

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Empirical methods for compound splitting

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Translating with non-contiguous phrases

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics

HLT '02 Proceedings of the second international conference on Human Language Technology Research
German Compounds in Factored Statistical Machine Translation

GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
A comparison of merging strategies for translation of German compounds

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
Experiments in morphosyntactic processing for translating to and from German

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Initial explorations in English to Turkish statistical machine translation

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Statistical machine translation of german compound words

FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
A formal model of ambiguity and its applications in machine translation

A formal model of ambiguity and its applications in machine translation

Modeling inflection and word-formation in SMT

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Generation of compound words in statistical machine translation into compounding languages

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many languages the use of compound words is very productive. A common practice to reduce sparsity consists in splitting compounds in the training data. When this is done, the system incurs the risk of translating components in non-consecutive positions, or in the wrong order. Furthermore, a post-processing step of compound merging is required to reconstruct compound words in the output. We present a method for increasing the chances that components that should be merged are translated into contiguous positions and in the right order. We also propose new heuristic methods for merging components that outperform all known methods, and a learning-based method that has similar accuracy as the heuristic method, is better at producing novel compounds, and can operate with no background linguistic resources.