A statistical model for scientific readability
Proceedings of the tenth international conference on Information and knowledge management
Predicting reading difficulty with statistical language models
Journal of the American Society for Information Science and Technology
Reading level assessment using support vector machines and statistical language models
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Matching readers' preferences and reading skills with appropriate web texts
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Demonstrations Session
Revisiting readability: a unified framework for predicting text quality
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
An analysis of statistical models and features for reading difficulty prediction
EANL '08 Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications
Readability assessment for text simplification
IUNLPBEA '10 Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications
Learning to predict readability using diverse linguistic features
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
A comparison of features for automatic readability assessment
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Hi-index | 0.00 |
Readability formulas are methods used to match texts with the readers' reading level. Several methodological paradigms have previously been investigated in the field. The most popular paradigm dates several decades back and gave rise to well known readability formulas such as the Flesch formula (among several others). This paper compares this approach (henceforth "classic") with an emerging paradigm which uses sophisticated NLP-enabled features and machine learning techniques. Our experiments, carried on a corpus of texts for French as a foreign language, yield four main results: (1) the new readability formula performed better than the "classic" formula; (2) "non-classic" features were slightly more informative than "classic" features; (3) modern machine learning algorithms did not improve the explanatory power of our readability model, but allowed to better classify new observations; and (4) combining "classic" and "non-classic" features resulted in a significant gain in performance.