Learning to predict readability using diverse linguistic features

Authors:
Rohit J. Kate;Xiaoqiang Luo;Siddharth Patwardhan;Martin Franz;Radu Florian;Raymond J. Mooney;Salim Roukos;Chris Welty
Affiliations:
The University of Texas at Austin;IBM Watson Research Center;IBM Watson Research Center;IBM Watson Research Center;IBM Watson Research Center;The University of Texas at Austin;IBM Watson Research Center;IBM Watson Research Center
Venue:
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Year:
2010

Citing 14
Cited 9

Simplifying decision trees

International Journal of Man-Machine Studies - Special Issue: Knowledge Acquisition for Knowledge-based Systems. Part 5
Bagging predictors

Machine Learning
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A statistical model for scientific readability

Proceedings of the tenth international conference on Information and knowledge management
Slot Grammar: A System for Simpler Construction of Practical Natural Language Grammars

Proceedings of the International Symposium on Natural Language and Logic
EasyEnglish: a tool for improving document quality

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
An empirical study of smoothing techniques for language modeling

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
A bootstrapping method for learning semantic lexicons using extraction pattern contexts

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)

Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)
Reading level assessment using support vector machines and statistical language models

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Predicting the readability of short web summaries

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Revisiting readability: a unified framework for predicting text quality

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
An analysis of statistical models and features for reading difficulty prediction

EANL '08 Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications
Automatically generating extraction patterns from untagged text

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2

Identifying enrichment candidates in textbooks

Proceedings of the 20th international conference companion on World wide web
A posteriori agreement as a quality measure for readability prediction systems

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Readability annotation: replacing the expert by the crowd

IUNLPBEA '11 Proceedings of the 6th Workshop on Innovative Use of NLP for Building Educational Applications
READ-IT: assessing readability of Italian texts with a view to text simplification

SLPAT '11 Proceedings of the Second Workshop on Speech and Language Processing for Assistive Technologies
Empowering authors to diagnose comprehension burden in textbooks

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
On improving the accuracy of readability classification using insights from second language acquisition

Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
Do NLP and machine learning improve traditional readability formulas?

PITR '12 Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations
Ranking Text Documents Based on Conceptual Difficulty Using Term Embedding and Sequential Discourse Cohesion

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Characterizing and Predicting the Multifaceted Nature of Quality in Educational Web Resources

ACM Transactions on Interactive Intelligent Systems (TiiS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we consider the problem of building a system to predict readability of natural-language documents. Our system is trained using diverse features based on syntax and language models which are generally indicative of readability. The experimental results on a dataset of documents from a mix of genres show that the predictions of the learned system are more accurate than the predictions of naive human judges when compared against the predictions of linguistically-trained expert human judges. The experiments also compare the performances of different learning algorithms and different types of feature sets when used for predicting readability.