Learning human-like knowledge by singular value decomposition: a progress report
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
A vector space model for automatic indexing
Communications of the ACM
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
Discriminative training and maximum entropy models for statistical machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Moses: open source toolkit for statistical machine translation
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Evaluation of the bible as a resource for cross-language information retrieval
MLRI '06 Proceedings of the Workshop on Multilingual Language Resources and Interoperability
Discriminative Phrase-Based Models for Arabic Machine Translation
ACM Transactions on Asian Language Information Processing (TALIP)
A vector-space dynamic feature for phrase-based statistical machine translation
Journal of Intelligent Information Systems
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Document-wide decoding for phrase-based statistical machine translation
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Statistical machine translation enhancements through linguistic levels: A survey
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
A semantic feature for statistical machine translation, based on Latent Semantic Indexing, is proposed and evaluated. The objective of the proposed feature is to account for the degree of similarity between a given input sentence and each individual sentence in the training dataset. This similarity is computed in a reduced vector-space constructed by means of the Latent Semantic Indexing decomposition. The computed similarity values are used as an additional feature in the log-linear model combination approach to statistical machine translation. In our implementation, the proposed feature is dynamically adjusted for each translation unit in the translation table according to the current input sentence to be translated. This model aims at favoring those translation units that were extracted from training sentences that are semantically related to the current input sentence being translated. Experimental results on a Spanish-to-English translation task on the Bible corpus demonstrate a significant improvement on translation quality with respect to a baseline system.