Automated essay scoring for nonnative English speakers

  • Authors:
  • Jill Burstein;Martin Chodorow

  • Affiliations:
  • Educational Testing Service, Princeton, New Jersey;Hunter College, CUNY, New York

  • Venue:
  • ASSESSEVALNLP '99 Proceedings of a Symposium on Computer Mediated Language Assessment and Evaluation in Natural Language Processing
  • Year:
  • 1999

Quantified Score

Hi-index 0.01

Visualization

Abstract

The e-rater system™ is an operational automated essay scoring system, developed at Educational Testing Service (ETS). The average agreement between human readers, and between independent human readers and e-rater is approximately 92%. There is much interest in the larger writing community in examining the system's performance on nonnative speaker essays. This paper focuses on results of a study that show e-rater's performance on Test of Written English (TWE) essay responses written by nonnative English speakers whose native language is Chinese, Arabic, or Spanish. In addition, one small sample of the data is from US-born English speakers, and another is from non-US-born candidates who report that their native language is English. As expected, significant differences were found among the scores of the English groups and the nonnative speakers. While there were also differences between e-rater and the human readers for the various language groups, the average agreement rate was as high as operational agreement. At least four of the five features that are included in e-rater's current operational models (including discourse, topical, and syntactic features) also appear in the TWE models. This suggests that the features generalize well over a wide range of linguistic variation, as e-rater was not confounded by non-standard English syntactic structures or stylistic discourse structures which one might expect to be a problem for a system designed to evaluate native speaker writing.