Natural language processing tools for reading level assessment and text simplification for bilingual education

Authors:
Mari Ostendorf;Sarah E. Petersen
Affiliations:
University of Washington;University of Washington
Venue:
Natural language processing tools for reading level assessment and text simplification for bilingual education
Year:
2007

Citing 0
Cited 7

Supporting the adaptation of texts for poor literacy readers: a text simplification editor for Brazilian Portuguese

EdAppsNLP '09 Proceedings of the Fourth Workshop on Innovative Use of NLP for Building Educational Applications
Enhancing authentic web pages for language learners

IUNLPBEA '10 Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications
Translating from complex to simplified sentences

PROPOR'10 Proceedings of the 9th international conference on Computational Processing of the Portuguese Language
Towards an on-demand simple Portuguese Wikipedia

SLPAT '11 Proceedings of the Second Workshop on Speech and Language Processing for Assistive Technologies
Text simplification using typed dependencies: a comparison of the robustness of different generation strategies

ENLG '11 Proceedings of the 13th European Workshop on Natural Language Generation
A dataset for the evaluation of lexical simplification

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Customizing search results for non-native speakers

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reading proficiency is a fundamental component of language competency. However, finding topical texts at an appropriate level for foreign and second language learners is a challenge for teachers. We address this problem using natural language processing technology to assess reading level and simplify text. In the context of foreign- and second-language learning, existing measures of reading level are not well-suited to this task. Related work has shown the benefit of using statistical language processing techniques; we extend these ideas and include other potential features to measure readability. In the first part of this dissertation we combine features from statistical language models, traditional reading level measures, and other language processing tools to produce a better method of detecting reading level. We discuss the performance of human annotators and evaluate results for our detectors with respect to human ratings. A key contribution is that our detectors are trainable; with training and test data from the same domain, our detectors outperform more general reading level tools (Flesch-Kincaid and Lexile). Trainability will allow performance to be tuned to address the needs of particular groups or students. Next, these tools are extended to enable teachers to more effectively take advantage of the large amounts of text available on the World Wide Web. The tools are augmented to handle web pages returned by a search engine, including filtering steps to eliminate "junk" pages with little or no text. These detectors are manually evaluated by elementary school teachers, the intended audience. We also explore adapting the detectors to the opinions of individual teachers. In the second part of the dissertation we address the task of text simplification in the context of language learning. We begin by analyzing pairs of original and manually simplified news articles to learn what people most often do when adapting text. Based on this analysis, we investigate two steps in simplification: choosing sentences to keep and splitting sentences. We study existing summarization and syntactic simplification tools applied to these steps and discuss other data-driven methods which in the future could be tuned to particular corpora or users.