Building readability lexicons with unannotated corpora

Authors:
Julian Brooke;Vivian Tsang;David Jacob;Fraser Shein;Graeme Hirst
Affiliations:
University of Toronto;Quillsoft Ltd., Toronto, Canada;Quillsoft Ltd., Toronto, Canada;University of Toronto and Quillsoft Ltd., Toronto, Canada;University of Toronto
Venue:
PITR '12 Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations
Year:
2012

Citing 12
Cited 0

A statistical model for scientific readability

Proceedings of the tenth international conference on Information and knowledge management
Measuring praise and criticism: Inference of semantic orientation from association

ACM Transactions on Information Systems (TOIS)
Predicting reading difficulty with statistical language models

Journal of the American Society for Information Science and Technology
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
A machine learning approach to reading level assessment

Computer Speech and Language
Cognitively motivated features for readability assessment

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
The automated text adaptation tool

NAACL-Demonstrations '07 Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations
Statistical estimation of word acquisition with application to readability prediction

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Emotions evoked by common words and phrases: using mechanical turk to create an emotion lexicon

CAAGET '10 Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text
Automatic acquisition of lexical formality

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Sorting texts by readability

Computational Linguistics
Lexicon-based methods for sentiment analysis

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Lexicons of word difficulty are useful for various educational applications, including readability classification and text simplification. In this work, we explore automatic creation of these lexicons using methods which go beyond simple term frequency, but without relying on age-graded texts. In particular, we derive information for each word type from the readability of the web documents they appear in and the words they co-occur with, linearly combining these various features. We show the efficacy of this approach by comparing our lexicon with an existing coarse-grained, low-coverage resource and a new crowdsourced annotation.