Detection of non-native sentences using machine-translated training data

Authors:
John Lee;Ming Zhou;Xiaohua Liu
Affiliations:
MIT CSAIL, Cambridge, MA;Microsoft Research Asia, Beijing, China;Microsoft Research Asia, Beijing, China
Venue:
NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Year:
2007

Citing 11
Cited 2

Making large-scale support vector machine learning practical

Advances in kernel methods
An intelligent tutoring system for deaf learners of written English

Assets '00 Proceedings of the fourth international ACM conference on Assistive technologies
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Three generative, lexicalised models for statistical parsing

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Automated essay evaluation: the criterion online writing service

AI Magazine
A machine learning approach to the automatic evaluation of machine translation

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
You're not from 'round here, are you?: naive Bayes detection of non-native utterance text

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Automatic error detection in the Japanese learners' English spoken data

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
Automated Japanese essay scoring system based on articles written by experts

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A feedback-augmented method for detecting errors in the writing of learners of English

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Correcting ESL errors using phrasal SMT techniques

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics

Sentence correction incorporating relative position and parse template language models

IEEE Transactions on Audio, Speech, and Language Processing
Error Diagnosis of Chinese Sentences Using Inductive Learning Algorithm and Decomposition-Based Testing Mechanism

ACM Transactions on Asian Language Information Processing (TALIP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Training statistical models to detect non-native sentences requires a large corpus of non-native writing samples, which is often not readily available. This paper examines the extent to which machine-translated (MT) sentences can substitute as training data. Two tasks are examined. For the native vs non-native classification task, non-native training data yields better performance; for the ranking task, however, models trained with a large, publicly available set of MT data perform as well as those trained with non-native data.