Implementing an efficient part-of-speech tagger
Software—Practice & Experience
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A systematic comparison of various statistical alignment models
Computational Linguistics
A practical part-of-speech tagger
ANLC '92 Proceedings of the third conference on Applied natural language processing
A non-projective dependency parser
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Improving SMT quality with morpho-syntactic analysis
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Empirical methods for compound splitting
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Minimum error rate training in statistical machine translation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Large Margin Methods for Structured and Interdependent Output Variables
The Journal of Machine Learning Research
Statistical Machine Translation with Scarce Resources Using Morpho-syntactic Information
Computational Linguistics
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Discriminative Reranking for Natural Language Parsing
Computational Linguistics
Translating with non-contiguous phrases
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics
HLT '02 Proceedings of the second international conference on Human Language Technology Research
German Compounds in Factored Statistical Machine Translation
GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Segmentation for English-to-Arabic statistical machine translation
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Moses: open source toolkit for statistical machine translation
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
A comparison of merging strategies for translation of German compounds
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
Predicting success in machine translation
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Using a maximum entropy model to build segmentation lattices for MT
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Getting to know Moses: initial experiments on German--English factored translation
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Meteor: an automatic metric for MT evaluation with high levels of correlation with human judgments
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Effects of morphological analysis in translation between German and English
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Towards better machine translation quality for the German--English language pairs
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Experiments in morphosyntactic processing for translating to and from German
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Initial explorations in English to Turkish statistical machine translation
StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Unsupervised and knowledge-free learning of compound splits and periphrases
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Language-independent compound splitting with morphological operations
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Better hypothesis testing for statistical machine translation: controlling for optimizer instability
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Statistical machine translation of german compound words
FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
A formal model of ambiguity and its applications in machine translation
A formal model of ambiguity and its applications in machine translation
Productive generation of compound words in statistical machine translation
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Hi-index | 0.00 |
In this article we investigate statistical machine translation SMT into Germanic languages, with a focus on compound processing. Our main goal is to enable the generation of novel compounds that have not been seen in the training data. We adopt a split-merge strategy, where compounds are split before training the SMT system, and merged after the translation step. This approach reduces sparsity in the training data, but runs the risk of placing translations of compound parts in non-consecutive positions. It also requires a postprocessing step of compound merging, where compounds are reconstructed in the translation output. We present a method for increasing the chances that components that should be merged are translated into contiguous positions and in the right order and show that it can lead to improvements both by direct inspection and in terms of standard translation evaluation metrics. We also propose several new methods for compound merging, based on heuristics and machine learning, which outperform previously suggested algorithms. These methods can produce novel compounds and a translation with at least the same overall quality as the baseline. For all subtasks we show that it is useful to include part-of-speech based information in the translation process, in order to handle compounds.