Comparing human versus automatic feature extraction for fine-grained elementary readability assessment

  • Authors:
  • Yi Ma;Ritu Singh;Eric Fosler-Lussier;Robert Lofthus

  • Affiliations:
  • The Ohio State University, Columbus, OH;The Ohio State University, Columbus, OH;The Ohio State University, Columbus, OH;Xerox Corporation, Rochester, NY

  • Venue:
  • PITR '12 Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Early primary children's literature poses some interesting challenges for automated readability assessment: for example, teachers often use fine-grained reading leveling systems for determining appropriate books for children to read (many current systems approach readability assessment at a coarser whole grade level). In previous work (Ma et al., 2012), we suggested that the fine-grained assessment task can be approached using a ranking methodology, and incorporating features that correspond to the visual layout of the page improves performance. However, the previous methodology for using "found" text (e.g., scanning in a book from the library) requires human annotation of the text regions and correction of the OCR text. In this work, we ask whether the annotation process can be automated, and also experiment with richer syntactic features found in the literature that can be automatically derived from either the human-corrected or raw OCR text. We find that automated visual and text feature extraction work reasonably well and can allow for scaling to larger datasets, but that in our particular experiments the use of syntactic features adds little to the performance of the system, contrary to previous findings.