On the limited memory BFGS method for large scale optimization
Mathematical Programming: Series A and B
Active learning for statistical natural language parsing
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Sample Selection for Statistical Parsing
Computational Linguistics
Multi-criteria-based active learning for named entity recognition
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Active learning for statistical phrase-based machine translation
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Active learning with statistical models
Journal of Artificial Intelligence Research
A semi-supervised batch-mode active learning strategy for improved statistical machine translation
CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Instance selection for machine translation using feature decay algorithms
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Does more data always yield better translations?
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Hi-index | 0.00 |
Production of parallel training corpora for the development of statistical machine translation (SMT) systems for resource-poor languages usually requires extensive manual effort. Active sample selection aims to reduce the labor, time, and expense incurred in producing such resources, attaining a given performance benchmark with the smallest possible training corpus by choosing informative, nonredundant source sentences from an available candidate pool for manual translation. We present a novel, discriminative sample selection strategy that preferentially selects batches of candidate sentences with constructs that lead to erroneous translations on a held-out development set. The proposed strategy supports a built-in diversity mechanism that reduces redundancy in the selected batches. Simulation experiments on English-to-Pashto and Spanish-to-English translation tasks demonstrate the superiority of the proposed approach to a number of competing techniques, such as random selection, dissimilarity-based selection, as well as a recently proposed semi-supervised active learning strategy.