A comparison of features for automatic readability assessment

  • Authors:
  • Lijun Feng;Martin Jansche;Matt Huenerfauth;Noémie Elhadad

  • Affiliations:
  • City University of New York;Google, Inc.;City University of New York;Columbia University

  • Venue:
  • COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Several sets of explanatory variables - including shallow, language modeling, POS, syntactic, and discourse features - are compared and evaluated in terms of their impact on predicting the grade level of reading material for primary school students. We find that features based on in-domain language models have the highest predictive power. Entity-density (a discourse feature) and POS-features, in particular nouns, are individually very useful but highly correlated. Average sentence length (a shallow feature) is more useful - and less expensive to compute - than individual syntactic features. A judicious combination of features examined here results in a significant improvement over the state of the art.