Scaling phrase-based statistical machine translation to larger corpora and longer phrases
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Language model adaptation for statistical machine translation with structured query models
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Exploiting Parallel Treebanks to Improve Phrase-Based Statistical Machine Translation
CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Domain adaptation for statistical machine translation with domain dictionary and monolingual corpora
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Domain adaptation for statistical machine translation with monolingual resources
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Stream-based translation models for statistical machine translation
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Training phrase translation models with leaving-one-out
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Intelligent selection of language model training data
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
2010 failures in English-Czech phrase-based MT
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Adaptive development data selection for log-linear model in statistical machine translation
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Better hypothesis testing for statistical machine translation: controlling for optimizer instability
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Domain adaptation for machine translation by mining unseen words
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Investigations on translation model adaptation using monolingual data
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
The CMU-ARK German-English translation system
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Domain adaptation via pseudo in-domain data selection
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Probes in a taxonomy of factored phrase-based models
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Findings of the 2012 workshop on statistical machine translation
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Hi-index | 0.00 |
We provide a few insights on data selection for machine translation. We evaluate the quality of the new CzEng 1.0, a parallel data source used in WMT12. We describe a simple technique for reducing out-of-vocabulary rate after phrase extraction. We discuss the benefits of tuning towards multiple reference translations for English-Czech language pair. We introduce a novel approach to data selection by full-text indexing and search: we select sentences similar to the test set from a large monolingual corpus and explore several options of incorporating them in a machine translation system. We show that this method can improve translation quality. Finally, we describe our submitted system CU-TAMCH-BOJ.