Selective Sampling Using the Query by Committee Algorithm
Machine Learning
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Minimum error rate training in statistical machine translation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Confidence estimation for machine translation
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Active learning for statistical phrase-based machine translation
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Hi-index | 0.00 |
The development of high-performance statistical machine translation (SMT) systems is contingent on the availability of substantial, in-domain parallel training corpora. The latter, however, are expensive to produce due to the labor-intensive nature of manual translation. We propose to alleviate this problem with a novel, semi-supervised, batch-mode active learning strategy that attempts to maximize in-domain coverage by selecting sentences, which represent a balance between domain match, translation difficulty, and batch diversity. Simulation experiments on an English-to-Pashto translation task show that the proposed strategy not only outperforms the random selection baseline, but also traditional active selection techniques based on dissimilarity to existing training data.