Statistical phrase-based translation
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Minimum error rate training in statistical machine translation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Effective phrase translation extraction from alignment models
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
An end-to-end discriminative approach to machine translation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Moses: open source toolkit for statistical machine translation
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Phrasetable smoothing for statistical machine translation
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Why generative phrase models underperform surface heuristics
StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Training phrase translation models with leaving-one-out
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Learning probabilistic synchronous CFGs for phrase-based translation
CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Hierarchical phrase-based translation grammars extracted from alignment posterior probabilities
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Maximum expected BLEU training of phrase and lexicon translation models
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Hi-index | 0.00 |
Training the phrase table by force-aligning (FA) the training data with the reference translation has been shown to improve the phrasal translation quality while significantly reducing the phrase table size on medium sized tasks. We apply this procedure to several large-scale tasks, with the primary goal of reducing model sizes without sacrificing translation quality. To deal with the noise in the automatically crawled parallel training data, we introduce on-demand word deletions, insertions, and backoffs to achieve over 99% successful alignment rate. We also add heuristics to avoid any increase in OOV rates. We are able to reduce already heavily pruned baseline phrase tables by more than 50% with little to no degradation in quality and occasionally slight improvement, without any increase in OOVs. We further introduce two global scaling factors for re-estimation of the phrase table via posterior phrase alignment probabilities and a modified absolute discounting method that can be applied to fractional counts.