Identification of translationese: a machine learning approach

  • Authors:
  • Iustina Ilisei;Diana Inkpen;Gloria Corpas Pastor;Ruslan Mitkov

  • Affiliations:
  • Research Institute in Information and Language Processing, University of Wolverhampton, Wolverhampton, United Kingdom;School of Information Technology and Engineering, University of Ottawa, Ottawa, Canada;Department of Translation and Interpreting, University of Málaga, Málaga, Spain;Research Institute in Information and Language Processing, University of Wolverhampton, Wolverhampton, United Kingdom

  • Venue:
  • CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a machine learning approach to the study of translationese. The goal is to train a computer system to distinguish between translated and non-translated text, in order to determine the characteristic features that influence the classifiers. Several algorithms reach up to 97.62% success rate on a technical dataset. Moreover, the SVM classifier consistently reports a statistically significant improved accuracy when the learning system benefits from the addition of simplification features to the basic translational classifier system. Therefore, these findings may be considered an argument for the existence of the Simplification Universal.