A semantic feature for statistical machine translation

Authors:
Rafael E. Banchs;Marta R. Costa-jussà
Affiliations:
Institute for Infocomm Research, Singapore;Barcelona Media Innovation Centre, planta, Barcelona
Venue:
SSST-5 Proceedings of the Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation
Year:
2011

Citing 9
Cited 3

Learning human-like knowledge by singular value decomposition: a progress report

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
A vector space model for automatic indexing

Communications of the ACM
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Discriminative training and maximum entropy models for statistical machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Evaluation of the bible as a resource for cross-language information retrieval

MLRI '06 Proceedings of the Workshop on Multilingual Language Resources and Interoperability
Discriminative Phrase-Based Models for Arabic Machine Translation

ACM Transactions on Asian Language Information Processing (TALIP)
A vector-space dynamic feature for phrase-based statistical machine translation

Journal of Intelligent Information Systems

The BM-I2R Haitian-Créole-to-English translation system description for the WMT 2011 evaluation campaign

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Document-wide decoding for phrase-based statistical machine translation

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Statistical machine translation enhancements through linguistic levels: A survey

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

A semantic feature for statistical machine translation, based on Latent Semantic Indexing, is proposed and evaluated. The objective of the proposed feature is to account for the degree of similarity between a given input sentence and each individual sentence in the training dataset. This similarity is computed in a reduced vector-space constructed by means of the Latent Semantic Indexing decomposition. The computed similarity values are used as an additional feature in the log-linear model combination approach to statistical machine translation. In our implementation, the proposed feature is dynamically adjusted for each translation unit in the translation table according to the current input sentence to be translated. This model aims at favoring those translation units that were extracted from training sentences that are semantically related to the current input sentence being translated. Experimental results on a Spanish-to-English translation task on the Bible corpus demonstrate a significant improvement on translation quality with respect to a baseline system.