Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Extension of Zipf's law to words and phrases
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Minimum error rate training in statistical machine translation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
The Alignment Template Approach to Statistical Machine Translation
Computational Linguistics
Bootstrapping parallel corpora
HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3
Indirect-HMM-based hypothesis alignment for combining outputs from machine translation systems
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Active learning for statistical phrase-based machine translation
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Bucking the trend: large-scale cost-focused active learning for statistical machine translation
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Active learning-based elicitation for semi-supervised word alignment
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Using variance as a stopping criterion for active learning of frame assignment
ALNLP '10 Proceedings of the NAACL HLT 2010 Workshop on Active Learning for Natural Language Processing
Evaluating the impact of coder errors on active learning
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Uncertainty-based active learning with instability estimation for text classification
ACM Transactions on Speech and Language Processing (TSLP)
Instance selection for machine translation using feature decay algorithms
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Hi-index | 0.00 |
Statistical machine translation (SMT) models require bilingual corpora for training, and these corpora are often multilingual with parallel text in multiple languages simultaneously. We introduce an active learning task of adding a new language to an existing multilingual set of parallel text and constructing high quality MT systems, from each language in the collection into this new target language. We show that adding a new language using active learning to the EuroParl corpus provides a significant improvement compared to a random sentence selection baseline. We also provide new highly effective sentence selection methods that improve AL for phrase-based SMT in the multilingual and single language pair setting.