Identification of translationese: a machine learning approach

Authors:
Iustina Ilisei;Diana Inkpen;Gloria Corpas Pastor;Ruslan Mitkov
Affiliations:
Research Institute in Information and Language Processing, University of Wolverhampton, Wolverhampton, United Kingdom;School of Information Technology and Engineering, University of Ottawa, Ottawa, Canada;Department of Translation and Interpreting, University of Málaga, Málaga, Spain;Research Institute in Information and Language Processing, University of Wolverhampton, Wolverhampton, United Kingdom
Venue:
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Year:
2010

Citing 4
Cited 8

Induction of Decision Trees

Machine Learning
A non-projective dependency parser

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter

Translationese and its dialects

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
How comparable are parallel corpora? Measuring the distribution of general vocabulary and connectives

BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Language models for machine translation: original vs. translated texts

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Searching for poor quality machine translated text: learning the difference between human writing and machine translations

Canadian AI'12 Proceedings of the 25th Canadian conference on Advances in Artificial Intelligence
Adapting translation models to translationese improves SMT

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Language models for machine translation: Original vs. translated texts

Computational Linguistics
Improving statistical machine translation by adapting translation models to translationese

Computational Linguistics
Improving statistical machine translation by adapting translation models to translationese

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a machine learning approach to the study of translationese. The goal is to train a computer system to distinguish between translated and non-translated text, in order to determine the characteristic features that influence the classifiers. Several algorithms reach up to 97.62% success rate on a technical dataset. Moreover, the SVM classifier consistently reports a statistically significant improved accuracy when the learning system benefits from the addition of simplification features to the basic translational classifier system. Therefore, these findings may be considered an argument for the existence of the Simplification Universal.