Automatic detection of omissions in translations
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
DEW '09 Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions
Translationese and its dialects
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Language models for machine translation: original vs. translated texts
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Identification of translationese: a machine learning approach
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Hi-index | 0.00 |
As machine translation (MT) tools have become mainstream, machine translated text has increasingly appeared on multilingual websites. Trustworthy multilingual websites are used as training corpora for statistical machine translation tools; large amounts of MT text in training data may make such products less effective. We performed three experiments to determine whether a support vector machine (SVM) could distinguish machine translated text from human written text (both original text and human translations). Machine translated versions of the Canadian Hansard were detected with an F-measure of 0.999. Machine translated versions of six Government of Canada web sites were detected with an F-measure of 0.98. We validated these results with a decision tree classifier. An experiment to find MT text on Government of Ontario web sites using Government of Canada training data was unfruitful, with a high rate of false positives. Machine translated text appears to be learnable and detectable when using a similar training corpus.