Building readability lexicons with unannotated corpora

  • Authors:
  • Julian Brooke;Vivian Tsang;David Jacob;Fraser Shein;Graeme Hirst

  • Affiliations:
  • University of Toronto;Quillsoft Ltd., Toronto, Canada;Quillsoft Ltd., Toronto, Canada;University of Toronto and Quillsoft Ltd., Toronto, Canada;University of Toronto

  • Venue:
  • PITR '12 Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Lexicons of word difficulty are useful for various educational applications, including readability classification and text simplification. In this work, we explore automatic creation of these lexicons using methods which go beyond simple term frequency, but without relying on age-graded texts. In particular, we derive information for each word type from the readability of the web documents they appear in and the words they co-occur with, linearly combining these various features. We show the efficacy of this approach by comparing our lexicon with an existing coarse-grained, low-coverage resource and a new crowdsourced annotation.