An analysis of statistical models and features for reading difficulty prediction

  • Authors:
  • Michael Heilman;Kevyn Collins-Thompson;Maxine Eskenazi

  • Affiliations:
  • Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA

  • Venue:
  • EANL '08 Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

A reading difficulty measure can be described as a function or model that maps a text to a numerical value corresponding to a difficulty or grade level. We describe a measure of readability that uses a combination of lexical features and grammatical features that are derived from subtrees of syntactic parses. We also tested statistical models for nominal, ordinal, and interval scales of measurement. The results indicate that a model for ordinal regression, such as the proportional odds model, using a combination of grammatical and lexical features is most effective at predicting reading difficulty.