Automated essay scoring for nonnative English speakers

Authors:
Jill Burstein;Martin Chodorow
Affiliations:
Educational Testing Service, Princeton, New Jersey;Hunter College, CUNY, New York
Venue:
ASSESSEVALNLP '99 Proceedings of a Symposium on Computer Mediated Language Assessment and Evaluation in Natural Language Processing
Year:
1999

Citing 3
Cited 4

Automatic essay grading using text categorization techniques

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Using register-diversified corpora for general language studies

Computational Linguistics - Special issue on using large corpora: II
Automated scoring using a hybrid feature identification technique

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1

Automated rating of ESL essays

HLT-NAACL-EDUC '03 Proceedings of the HLT-NAACL 03 workshop on Building educational applications using natural language processing - Volume 2
A model for quantitative evaluation of an end-to-end question-answering system

Journal of the American Society for Information Science and Technology
Automatic identification of discourse moves in scientific article introductions

EANL '08 Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications
Diagnosing meaning errors in short answers to reading comprehension questions

EANL '08 Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications

Quantified Score

Hi-index	0.01

Visualization

Abstract

The e-rater system™ is an operational automated essay scoring system, developed at Educational Testing Service (ETS). The average agreement between human readers, and between independent human readers and e-rater is approximately 92%. There is much interest in the larger writing community in examining the system's performance on nonnative speaker essays. This paper focuses on results of a study that show e-rater's performance on Test of Written English (TWE) essay responses written by nonnative English speakers whose native language is Chinese, Arabic, or Spanish. In addition, one small sample of the data is from US-born English speakers, and another is from non-US-born candidates who report that their native language is English. As expected, significant differences were found among the scores of the English groups and the nonnative speakers. While there were also differences between e-rater and the human readers for the various language groups, the average agreement rate was as high as operational agreement. At least four of the five features that are included in e-rater's current operational models (including discourse, topical, and syntactic features) also appear in the TWE models. This suggests that the features generalize well over a wide range of linguistic variation, as e-rater was not confounded by non-standard English syntactic structures or stylistic discourse structures which one might expect to be a problem for a system designed to evaluate native speaker writing.