Combining a statistical language model with logistic regression to predict the lexical and syntactic difficulty of texts for FFL

Authors:
Thomas L. François
Affiliations:
Université catholique de Louvain, Louvain-la-Neuve, Belgium
Venue:
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
Year:
2009

Citing 5
Cited 5

Random Forests

Machine Learning
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Predicting reading difficulty with statistical language models

Journal of the American Society for Information Science and Technology
Reading level assessment using support vector machines and statistical language models

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
An analysis of statistical models and features for reading difficulty prediction

EANL '08 Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications

EUSUM: extracting easy-to-understand english summaries for non-native readers

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Readability annotation: replacing the expert by the crowd

IUNLPBEA '11 Proceedings of the 6th Workshop on Innovative Use of NLP for Building Educational Applications
An "AI readability" formula for French as a foreign language

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Assessing user-specific difficulty of documents

Information Processing and Management: an International Journal
From input to output: the potential of parallel corpora for CALL

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reading is known to be an essential task in language learning, but finding the appropriate text for every learner is far from easy. In this context, automatic procedures can support the teacher's work. Some tools exist for English, but at present there are none for French as a foreign language (FFL). In this paper, we present an original approach to assessing the readability of FFL texts using NLP techniques and extracts from FFL textbooks as our corpus. Two logistic regression models based on lexical and grammatical features are explored and give quite good predictions on new texts. The results shows a slight superiority for multinomial logistic regression over the proportional odds model.