A machine learning approach to reading level assessment

  • Authors:
  • Sarah E. Petersen;Mari Ostendorf

  • Affiliations:
  • Department of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States;Department of Electrical Engineering, University of Washington, Seattle, WA 98195, United States

  • Venue:
  • Computer Speech and Language
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Reading proficiency is a fundamental component of language competency. However, finding topical texts at an appropriate reading level for foreign and second language learners is a challenge for teachers. Existing measures of reading level are not well suited to this task, where students may know some difficult topic-related vocabulary items but not have the same level of sophistication in understanding complex sentence constructions. Recent work in this area has shown the benefit of using statistical language processing techniques. In this paper, we use support vector machines to combine features from n-gram language models, parses, and traditional reading level measures to produce a better method of assessing reading level. We explore the use of negative training data to handle the problem of rejecting data from classes not seen in training, and compare the use of detection vs. regression models on this task. As in many language processing problems, we find substantial variability in human annotation of reading level, and explore ways that multiple human annotations can be used in comparative assessments of system performance.