Detection of non-native sentences using machine-translated training data

  • Authors:
  • John Lee;Ming Zhou;Xiaohua Liu

  • Affiliations:
  • MIT CSAIL, Cambridge, MA;Microsoft Research Asia, Beijing, China;Microsoft Research Asia, Beijing, China

  • Venue:
  • NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Training statistical models to detect non-native sentences requires a large corpus of non-native writing samples, which is often not readily available. This paper examines the extent to which machine-translated (MT) sentences can substitute as training data. Two tasks are examined. For the native vs non-native classification task, non-native training data yields better performance; for the ranking task, however, models trained with a large, publicly available set of MT data perform as well as those trained with non-native data.