A posteriori agreement as a quality measure for readability prediction systems

Authors:
Philip Van Oosten;Véronique Hoste;Dries Tanghe
Affiliations:
Language and Translation Technology Team, University College Ghent, Ghent, Belgium and Ghent University, Ghent, Belgium;Language and Translation Technology Team, University College Ghent, Ghent, Belgium and Ghent University, Ghent, Belgium;Language and Translation Technology Team, University College Ghent, Ghent, Belgium and Ghent University, Ghent, Belgium
Venue:
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Year:
2011

Citing 8
Cited 1

Data clustering: a review

ACM Computing Surveys (CSUR)
Speech and Language Processing (2nd Edition)

Speech and Language Processing (2nd Edition)
Reading level assessment using support vector machines and statistical language models

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Revisiting readability: a unified framework for predicting text quality

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
From annotator agreement to noise models

Computational Linguistics
Learning to predict readability using diverse linguistic features

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Sorting texts by readability

Computational Linguistics
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)

An "AI readability" formula for French as a foreign language

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

All readability research is ultimately concerned with the research question whether it is possible for a prediction system to automatically determine the level of readability of an unseen text. A significant problem for such a system is that readability might depend in part on the reader. If different readers assess the readability of texts in fundamentally different ways, there is insufficient a priori agreement to justify the correctness of a readability prediction system based on the texts assessed by those readers. We built a data set of readability assessments by expert readers. We clustered the experts into groups with greater a priori agreement and then measured for each group whether classifiers trained only on data from this group exhibited a classification bias. As this was found to be the case, the classification mechanism cannot be unproblematically generalized to a different user group.