Learning to predict readability using diverse linguistic features

  • Authors:
  • Rohit J. Kate;Xiaoqiang Luo;Siddharth Patwardhan;Martin Franz;Radu Florian;Raymond J. Mooney;Salim Roukos;Chris Welty

  • Affiliations:
  • The University of Texas at Austin;IBM Watson Research Center;IBM Watson Research Center;IBM Watson Research Center;IBM Watson Research Center;The University of Texas at Austin;IBM Watson Research Center;IBM Watson Research Center

  • Venue:
  • COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we consider the problem of building a system to predict readability of natural-language documents. Our system is trained using diverse features based on syntax and language models which are generally indicative of readability. The experimental results on a dataset of documents from a mix of genres show that the predictions of the learned system are more accurate than the predictions of naive human judges when compared against the predictions of linguistically-trained expert human judges. The experiments also compare the performances of different learning algorithms and different types of feature sets when used for predicting readability.