On the limited memory BFGS method for large scale optimization
Mathematical Programming: Series A and B
Integration of diverse recognition methodologies through reevaluation of N-best sentence hypotheses
HLT '91 Proceedings of the workshop on Speech and Natural Language
A systematic comparison of various statistical alignment models
Computational Linguistics
Computational Linguistics - Special issue on web as corpus
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Minimum error rate training in statistical machine translation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Language and translation model adaptation using comparable corpora
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Experiments in domain adaptation for statistical machine translation
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Meteor: an automatic metric for MT evaluation with high levels of correlation with human judgments
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Translation model adaptation by resampling
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Discriminative instance weighting for domain adaptation in statistical machine translation
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Adaptive development data selection for log-linear model in statistical machine translation
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Two easy improvements to lexical weighting
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
CEU-UPV English-Spanish system for WMT11
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Domain adaptation via pseudo in-domain data selection
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Does more data always yield better translations?
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Perplexity minimization for translation model domain adaptation in statistical machine translation
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Translation model adaptation for statistical machine translation with monolingual topic information
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Topic models for dynamic translation model adaptation
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Bagging and Boosting statistical machine translation systems
Artificial Intelligence
Hi-index | 0.00 |
Current statistical machine translation (SMT) systems are trained on sentence-aligned and word-aligned parallel text collected from various sources. Translation model parameters are estimated from the word alignments, and the quality of the translations on a given test set depends on the parameter estimates. There are at least two factors affecting the parameter estimation: domain match and training data quality. This paper describes a novel approach for automatically detecting and down-weighing certain parts of the training corpus by assigning a weight to each sentence in the training bitext so as to optimize a discriminative objective function on a designated tuning set. This way, the proposed method can limit the negative effects of low quality training data, and can adapt the translation model to the domain of interest. It is shown that such discriminative corpus weights can provide significant improvements in Arabic-English translation on various conditions, using a state-of-the-art SMT system.