Learning to classify text from labeled and unlabeled documents
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
A statistical model for scientific readability
Proceedings of the tenth international conference on Information and knowledge management
Improving Text Classification by Shrinkage in a Hierarchy of Classes
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A computational model of lexical cohesion analysis and its application to the evaluation of text coherence
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
A study of cross-validation and bootstrap for accuracy estimation and model selection
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
A machine learning approach to reading level assessment
Computer Speech and Language
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
EdAppsNLP '09 Proceedings of the Fourth Workshop on Innovative Use of NLP for Building Educational Applications
An analysis of statistical models and features for reading difficulty prediction
EANL '08 Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications
Retrieval of reading materials for vocabulary and reading practice
EANL '08 Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications
Statistical estimation of word acquisition with application to readability prediction
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
EUSUM: extracting easy-to-understand english summaries for non-native readers
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Predicting cloze task quality for vocabulary training
IUNLPBEA '10 Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications
Personalization of Reading Passages Improves Vocabulary Acquisition
International Journal of Artificial Intelligence in Education
Classic children's literature - difficult to read ?
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Learning to identify educational materials
ACM Transactions on Speech and Language Processing (TSLP)
An unsupervised ranking method based on a technical difficulty terrain
Proceedings of the 20th ACM international conference on Information and knowledge management
Automatic control of simple language in web pages
ICCHP'06 Proceedings of the 10th international conference on Computers Helping People with Special Needs
Readability applied to information retrieval
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Building readability lexicons with unannotated corpora
PITR '12 Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations
Do NLP and machine learning improve traditional readability formulas?
PITR '12 Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations
An "AI readability" formula for French as a foreign language
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Customizing search results for non-native speakers
Proceedings of the 21st ACM international conference on Information and knowledge management
Assessing user-specific difficulty of documents
Information Processing and Management: an International Journal
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Document features predicting assessor disagreement
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
A potentially useful feature of information retrieval systems for students is the ability to identify documents that not only are relevant to the query but also match the student's reading level. Manually obtaining an estimate of reading difficulty for each document is not feasible for very large collections, so we require an automated technique. Traditional readability measures, such as the widely used Flesch-Kincaid measure, are simple to apply but perform poorly on Web pages and other non-traditional documents. This work focuses on building a broadly applicable statistical model of text for different reading levels that works for a wide range of documents. To do this, we recast the well-studied problem of readability in terms of text categorization and use straightforward techniques from statistical language modeling. We show that with a modified form of text categorization, it is possible to build generally applicable classifiers with relatively little training data. We apply this method to the problem of classifying Web pages according to their reading difficulty level and show that by using a mixture model to interpolate evidence of a word's frequency across grades, it is possible to build a classifier that achieves an average root mean squared error of between one and two grade levels for 9 of 12 grades. Such classifiers have very efficient implementations and can be applied in many different scenarios. The models can be varied to focus on smaller or larger grade ranges or easily retrained for a variety of tasks or populations.