Comparing human versus automatic feature extraction for fine-grained elementary readability assessment

Authors:
Yi Ma;Ritu Singh;Eric Fosler-Lussier;Robert Lofthus
Affiliations:
The Ohio State University, Columbus, OH;The Ohio State University, Columbus, OH;The Ohio State University, Columbus, OH;Xerox Corporation, Rochester, NY
Venue:
PITR '12 Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations
Year:
2012

Citing 8
Cited 0

A statistical model for scientific readability

Proceedings of the tenth international conference on Information and knowledge management
Accurate unlexicalized parsing

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Training linear SVMs in linear time

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Reading level assessment using support vector machines and statistical language models

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A machine learning approach to reading level assessment

Computer Speech and Language
Readability assessment for text simplification

IUNLPBEA '10 Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications
A comparison of features for automatic readability assessment

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Ranking-based readability assessment for early primary children's literature

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Early primary children's literature poses some interesting challenges for automated readability assessment: for example, teachers often use fine-grained reading leveling systems for determining appropriate books for children to read (many current systems approach readability assessment at a coarser whole grade level). In previous work (Ma et al., 2012), we suggested that the fine-grained assessment task can be approached using a ranking methodology, and incorporating features that correspond to the visual layout of the page improves performance. However, the previous methodology for using "found" text (e.g., scanning in a book from the library) requires human annotation of the text regions and correction of the OCR text. In this work, we ask whether the annotation process can be automated, and also experiment with richer syntactic features found in the literature that can be automatically derived from either the human-corrected or raw OCR text. We find that automated visual and text feature extraction work reasonably well and can allow for scaling to larger datasets, but that in our particular experiments the use of syntactic features adds little to the performance of the system, contrary to previous findings.