A limited memory algorithm for bound constrained optimization
SIAM Journal on Scientific Computing
Feature selection, L1 vs. L2 regularization, and rotational invariance
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Statistical phrase-based translation
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Minimum error rate training in statistical machine translation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Hierarchical Phrase-Based Translation
Computational Linguistics
Predicting success in machine translation
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Generalizing local translation models
SSST '08 Proceedings of the Second Workshop on Syntax and Structure in Statistical Translation
English-to-Czech factored machine translation
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Case markers and morphology: addressing the crux of the fluency problem in English-Hindi SMT
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
A class-based agreement model for generating accurately inflected translations
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Machine translation without words through substring alignment
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Substring-based machine translation
Machine Translation
Hi-index | 0.00 |
This paper presents an exponential model for translation into highly inflected languages which can be scaled to very large datasets. As in other recent proposals, it predicts target-side phrases and can be conditioned on source-side context. However, crucially for the task of modeling morphological generalizations, it estimates feature parameters from the entire training set rather than as a collection of separate classifiers. We apply it to English-Czech translation, using a variety of features capturing potential predictors for case, number, and gender, and one of the largest publicly available parallel data sets. We also describe generation and modeling of inflected forms unobserved in training data and decoding procedures for a model with non-local target-side feature dependencies.