Suffix arrays: a new method for on-line string searches
SODA '90 Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms
A systematic comparison of various statistical alignment models
Computational Linguistics
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
HMM-based word alignment in statistical translation
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Minimum error rate training in statistical machine translation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Scaling phrase-based statistical machine translation to larger corpora and longer phrases
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Hierarchical Phrase-Based Translation
Computational Linguistics
Tera-scale translation models via pattern matching
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Online EM for unsupervised models
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Experiments in domain adaptation for statistical machine translation
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Joshua: an open source toolkit for parsing-based machine translation
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Domain adaptation for statistical machine translation with monolingual resources
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Stream-based randomised language models for SMT
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Dynamic extended suffix arrays
Journal of Discrete Algorithms
Approximate scalable bounded space sketch for large data NLP
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Active learning for interactive machine translation
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
LetsMT!: a cloud-based platform for do-it-yourself machine translation
ACL '12 Proceedings of the ACL 2012 System Demonstrations
Fast large-scale approximate graph construction for NLP
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Selecting data for English-to-Czech machine translation
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Hi-index | 0.00 |
Typical statistical machine translation systems are trained with static parallel corpora. Here we account for scenarios with a continuous incoming stream of parallel training data. Such scenarios include daily governmental proceedings, sustained output from translation agencies, or crowd-sourced translations. We show incorporating recent sentence pairs from the stream improves performance compared with a static baseline. Since frequent batch retraining is computationally demanding we introduce a fast incremental alternative using an online version of the EM algorithm. To bound our memory requirements we use a novel data-structure and associated training regime. When compared to frequent batch retraining, our online time and space-bounded model achieves the same performance with significantly less computational overhead.