Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Reading level assessment using support vector machines and statistical language models
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Characterizing Genres of Web Pages: Genre Hybridism and Individualization
HICSS '07 Proceedings of the 40th Annual Hawaii International Conference on System Sciences
A machine learning approach to reading level assessment
Computer Speech and Language
Syntactic complexity measures for detecting mild cognitive impairment
BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Cognitively motivated features for readability assessment
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Revisiting readability: a unified framework for predicting text quality
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
An analysis of statistical models and features for reading difficulty prediction
EANL '08 Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications
Real-time web text classification and analysis of reading difficulty
EANL '08 Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications
The linguistics of readability: the next step for word processing
CL&W '10 Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics and Writing: Writing Processes and Authoring Aids
Readability assessment for text simplification
IUNLPBEA '10 Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications
Hi-index | 0.00 |
The Web content accessibility guidelines (WCAG) 2.0 include in its principle of comprehensibility an accessibility requirement related to the level of writing. This requirement states that websites with texts demanding higher reading skills than individuals with lower secondary education possess (fifth to ninth grades in Brazil) should offer them an alternative version of the same content. Natural Language Processing technology and research in Psycholinguistics can help automate the task of classifying a text according to its reading difficulty. In this paper, we present experiments to build a readability checker to classify texts in Portuguese, considering different text genres, domains and reader ages, using naturally occurring texts. More precisely, we classify texts in simple (for 7 to 14-year-olds) and complex (for adults), and address three key research questions: (1) Which machine-learning algorithm produces the best results? (2) Which features are relevant? (3) Do different text genres have an impact on readability assessment?.