A statistical model for scientific readability
Proceedings of the tenth international conference on Information and knowledge management
Accurate unlexicalized parsing
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Training linear SVMs in linear time
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Reading level assessment using support vector machines and statistical language models
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A machine learning approach to reading level assessment
Computer Speech and Language
Readability assessment for text simplification
IUNLPBEA '10 Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications
A comparison of features for automatic readability assessment
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Ranking-based readability assessment for early primary children's literature
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Hi-index | 0.00 |
Early primary children's literature poses some interesting challenges for automated readability assessment: for example, teachers often use fine-grained reading leveling systems for determining appropriate books for children to read (many current systems approach readability assessment at a coarser whole grade level). In previous work (Ma et al., 2012), we suggested that the fine-grained assessment task can be approached using a ranking methodology, and incorporating features that correspond to the visual layout of the page improves performance. However, the previous methodology for using "found" text (e.g., scanning in a book from the library) requires human annotation of the text regions and correction of the OCR text. In this work, we ask whether the annotation process can be automated, and also experiment with richer syntactic features found in the literature that can be automatically derived from either the human-corrected or raw OCR text. We find that automated visual and text feature extraction work reasonably well and can allow for scaling to larger datasets, but that in our particular experiments the use of syntactic features adds little to the performance of the system, contrary to previous findings.