Identification of translationese: a machine learning approach
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Multilingual annotation and disambiguation of discourse connectives for machine translation
SIGDIAL '11 Proceedings of the SIGDIAL 2011 Conference
Using sense-labeled discourse connectives for statistical machine translation
EACL 2012 Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
Hi-index | 0.00 |
In this paper, we question the homogeneity of a large parallel corpus by measuring the similarity between various sub-parts. We compare results obtained using a general measure of lexical similarity based on χ2 and by counting the number of discourse connectives. We argue that discourse connectives provide a more sensitive measure, revealing differences that are not visible with the general measure. We also provide evidence for the existence of specific characteristics defining translated texts as opposed to non-translated ones, due to a universal tendency for explicitation.