A machine learning approach to reading level assessment

Authors:
Sarah E. Petersen;Mari Ostendorf
Affiliations:
Department of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States;Department of Electrical Engineering, University of Washington, Seattle, WA 98195, United States
Venue:
Computer Speech and Language
Year:
2009

Citing 15
Cited 16

Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics
Making large-scale support vector machine learning practical

Advances in kernel methods
A statistical model for scientific readability

Proceedings of the tenth international conference on Information and knowledge management
Text genre classification with genre-revealing and subject-revealing features

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Relevance Feedback using Support Vector Machines

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Support vector machine active learning with applications to text classification

The Journal of Machine Learning Research
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Automatic detection of text genre

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Text genre detection using common word frequencies

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Language and task independent text categorization with simple language models

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Sentence alignment for monolingual comparable corpora

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Predicting reading difficulty with statistical language models

Journal of the American Society for Information Science and Technology
Reading level assessment using support vector machines and statistical language models

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics

Automatic readability assessment for people with intellectual disabilities

ACM SIGACCESS Accessibility and Computing
Cognitively motivated features for readability assessment

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Comparing evaluation techniques for text readability software for adults with intellectual disabilities

Proceedings of the 11th international ACM SIGACCESS conference on Computers and accessibility
Readability assessment for text simplification

IUNLPBEA '10 Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications
Structural features for predicting the linguistic quality of text: applications to machine translation, automatic summarization and human-authored text

Empirical methods in natural language generation
A comparison of features for automatic readability assessment

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Revisiting the readability assessment of texts in Portuguese

IBERAMIA'10 Proceedings of the 12th Ibero-American conference on Advances in artificial intelligence
Classic children's literature - difficult to read ?

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
READ-IT: assessing readability of Italian texts with a view to text simplification

SLPAT '11 Proceedings of the Second Workshop on Speech and Language Processing for Assistive Technologies
Ranking-based readability assessment for early primary children's literature

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
On improving the accuracy of readability classification using insights from second language acquisition

Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
Building readability lexicons with unannotated corpora

PITR '12 Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations
Comparing human versus automatic feature extraction for fine-grained elementary readability assessment

PITR '12 Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations
Customizing search results for non-native speakers

Proceedings of the 21st ACM international conference on Information and knowledge management
Assessing user-specific difficulty of documents

Information Processing and Management: an International Journal
Towards an integrated approach for evaluating textual complexity for learning purposes

ICWL'12 Proceedings of the 11th international conference on Advances in Web-Based Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reading proficiency is a fundamental component of language competency. However, finding topical texts at an appropriate reading level for foreign and second language learners is a challenge for teachers. Existing measures of reading level are not well suited to this task, where students may know some difficult topic-related vocabulary items but not have the same level of sophistication in understanding complex sentence constructions. Recent work in this area has shown the benefit of using statistical language processing techniques. In this paper, we use support vector machines to combine features from n-gram language models, parses, and traditional reading level measures to produce a better method of assessing reading level. We explore the use of negative training data to handle the problem of rejecting data from classes not seen in training, and compare the use of detection vs. regression models on this task. As in many language processing problems, we find substantial variability in human annotation of reading level, and explore ways that multiple human annotations can be used in comparative assessments of system performance.