Phrase-Based Statistical Machine Translation
KI '02 Proceedings of the 25th Annual German Conference on AI: Advances in Artificial Intelligence
A systematic comparison of various statistical alignment models
Computational Linguistics
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
Improving statistical natural language translation with categories and rules
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Discriminative training and maximum entropy models for statistical machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Minimum error rate training in statistical machine translation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Improved discriminative bilingual word alignment
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Confidence estimation for machine translation
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Word-level confidence estimation for machine translation using phrase-based translation models
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Measuring Word Alignment Quality for Statistical Machine Translation
Computational Linguistics
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Moses: open source toolkit for statistical machine translation
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Online large-margin training of syntactic and structural translation features
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
NRC's PORTAGE system for WMT 2007
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Meteor: an automatic metric for MT evaluation with high levels of correlation with human judgments
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Learning performance of a machine translation system: a statistical and computational analysis
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Findings of the 2009 workshop on statistical machine translation
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Bucking the trend: large-scale cost-focused active learning for statistical machine translation
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Assessing phrase-based translation models with oracle decoding
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Discriminative word alignment by linear modeling
Computational Linguistics
Lattice BLEU oracles in machine translation
ACM Transactions on Speech and Language Processing (TSLP)
Computer Speech and Language
An intelligent Web agent that autonomously learns how to translate
Web Intelligence and Agent Systems
Hi-index | 0.00 |
We present an extensive experimental study of Phrase-based Statistical Machine Translation, from the point of view of its learning capabilities. Very accurate Learning Curves are obtained, using high-performance computing, and extrapolations of the projected performance of the system under different conditions are provided. Our experiments confirm existing and mostly unpublished beliefs about the learning capabilities of statistical machine translation systems. We also provide insight into the way statistical machine translation learns from data, including the respective influence of translation and language models, the impact of phrase length on performance, and various unlearning and perturbation analyses. Our results support and illustrate the fact that performance improves by a constant amount for each doubling of the data, across different language pairs, and different systems. This fundamental limitation seems to be a direct consequence of Zipf law governing textual data. Although the rate of improvement may depend on both the data and the estimation method, it is unlikely that the general shape of the learning curve will change withoutmajor changes in the modeling and inference phases. Possible research directions that address this issue include the integration of linguistic rules or the development of active learning procedures.