Revisiting readability: a unified framework for predicting text quality

Authors:
Emily Pitler;Ani Nenkova
Affiliations:
University of Pennsylvania, Philadelphia, PA;University of Pennsylvania, Philadelphia, PA
Venue:
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Year:
2008

Citing 14
Cited 39

Centering: a framework for modeling the local coherence of discourse

Computational Linguistics
A statistical model for scientific readability

Proceedings of the tenth international conference on Information and knowledge management
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
A corpus-based investigation of definite description use

Computational Linguistics
References to named entities: a corpus study

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
The role of centering theory's rough-shift in the teaching and evaluation of writing skills

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Representing Discourse Coherence: A Corpus-Based Study

Computational Linguistics
Building a discourse-tagged corpus in the framework of Rhetorical Structure Theory

SIGDIAL '01 Proceedings of the Second SIGdial Workshop on Discourse and Dialogue - Volume 16
Reading level assessment using support vector machines and statistical language models

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Automatic Evaluation of Information Ordering: Kendall's Tau

Computational Linguistics
Modeling local coherence: An entity-based approach

Computational Linguistics
Evaluating centering for information ordering using corpora

Computational Linguistics
Coreference-inspired coherence modeling

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Mining a lexicon of technical terms and lay equivalents

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing

Opportunities for Natural Language Processing Research in Education

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Matching readers' preferences and reading skills with appropriate web texts

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Demonstrations Session
Predicting the fluency of text with shallow structural features: case studies of machine translation and human-written text

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Genre distinctions for discourse in the Penn TreeBank

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Domain-specific iterative readability computation

Proceedings of the 10th annual joint conference on Digital libraries
EUSUM: extracting easy-to-understand english summaries for non-native readers

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
On automated evaluation of readability of summaries: capturing grammaticality, focus, structure and coherence

HLT-SRWS '10 Proceedings of the NAACL HLT 2010 Student Research Workshop
Readability assessment for text simplification

IUNLPBEA '10 Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications
Web page classification on child suitability

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Learning to predict readability using diverse linguistic features

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Structural features for predicting the linguistic quality of text: applications to machine translation, automatic summarization and human-authored text

Empirical methods in natural language generation
A comparison of features for automatic readability assessment

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Revisiting the readability assessment of texts in Portuguese

IBERAMIA'10 Proceedings of the 12th Ibero-American conference on Advances in artificial intelligence
Sorting texts by readability

Computational Linguistics
A posteriori agreement as a quality measure for readability prediction systems

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Disentangling chat with local coherence models

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
A semantic graph-based approach to biomedical summarisation

Artificial Intelligence in Medicine
Readability annotation: replacing the expert by the crowd

IUNLPBEA '11 Proceedings of the 6th Workshop on Innovative Use of NLP for Building Educational Applications
Predicting change in student motivation by measuring cohesion between tutor and student

IUNLPBEA '11 Proceedings of the 6th Workshop on Innovative Use of NLP for Building Educational Applications
Measuring Comprehensibility of Web Pages Based on Link Analysis

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Text summarisation in progress: a literature review

Artificial Intelligence Review
To each his own: personalized content selection based on text comprehensibility

Proceedings of the fifth ACM international conference on Web search and data mining
READ-IT: assessing readability of Italian texts with a view to text simplification

SLPAT '11 Proceedings of the Second Workshop on Speech and Language Processing for Assistive Technologies
Modelling discourse relations for Arabic

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Quantitative evaluation of grammaticality of summaries

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Resolving ambiguity in biomedical text to improve summarization

Information Processing and Management: an International Journal
Automatic metrics for genre-specific text quality

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop
Making readability indices readable

PITR '12 Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations
Do NLP and machine learning improve traditional readability formulas?

PITR '12 Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations
An "AI readability" formula for French as a foreign language

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
A coherence model based on syntactic patterns

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Characterizing stylistic elements in syntactic structure

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Customizing search results for non-native speakers

Proceedings of the 21st ACM international conference on Information and knowledge management
Measuring the Visual Complexities of Web Pages

ACM Transactions on the Web (TWEB)
Ranking Text Documents Based on Conceptual Difficulty Using Term Embedding and Sequential Discourse Cohesion

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
A genetic graph-based clustering approach to biomedical summarization

Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics
How unfamiliar words in smartphone manuals affect senior citizens

UAHCI'13 Proceedings of the 7th international conference on Universal Access in Human-Computer Interaction: applications and services for quality of life - Volume Part III
Characterizing and Predicting the Multifaceted Nature of Quality in Educational Web Resources

ACM Transactions on Interactive Intelligent Systems (TiiS)
Text simplification resources for Spanish

Language Resources and Evaluation

Quantified Score

Hi-index	0.01

Visualization

Abstract

We combine lexical, syntactic, and discourse features to produce a highly predictive model of human readers' judgments of text readability. This is the first study to take into account such a variety of linguistic factors and the first to empirically demonstrate that discourse relations are strongly associated with the perceived quality of text. We show that various surface metrics generally expected to be related to readability are not very good predictors of readability judgments in our Wall Street Journal corpus. We also establish that readability predictors behave differently depending on the task: predicting text readability or ranking the readability. Our experiments indicate that discourse relations are the one class of features that exhibits robustness across these two tasks.