Natural language processing tools for reading level assessment and text simplification for bilingual education

  • Authors:
  • Mari Ostendorf;Sarah E. Petersen

  • Affiliations:
  • University of Washington;University of Washington

  • Venue:
  • Natural language processing tools for reading level assessment and text simplification for bilingual education
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Reading proficiency is a fundamental component of language competency. However, finding topical texts at an appropriate level for foreign and second language learners is a challenge for teachers. We address this problem using natural language processing technology to assess reading level and simplify text. In the context of foreign- and second-language learning, existing measures of reading level are not well-suited to this task. Related work has shown the benefit of using statistical language processing techniques; we extend these ideas and include other potential features to measure readability. In the first part of this dissertation we combine features from statistical language models, traditional reading level measures, and other language processing tools to produce a better method of detecting reading level. We discuss the performance of human annotators and evaluate results for our detectors with respect to human ratings. A key contribution is that our detectors are trainable; with training and test data from the same domain, our detectors outperform more general reading level tools (Flesch-Kincaid and Lexile). Trainability will allow performance to be tuned to address the needs of particular groups or students. Next, these tools are extended to enable teachers to more effectively take advantage of the large amounts of text available on the World Wide Web. The tools are augmented to handle web pages returned by a search engine, including filtering steps to eliminate "junk" pages with little or no text. These detectors are manually evaluated by elementary school teachers, the intended audience. We also explore adapting the detectors to the opinions of individual teachers. In the second part of the dissertation we address the task of text simplification in the context of language learning. We begin by analyzing pairs of original and manually simplified news articles to learn what people most often do when adapting text. Based on this analysis, we investigate two steps in simplification: choosing sentences to keep and splitting sentences. We study existing summarization and syntactic simplification tools applied to these steps and discuss other data-driven methods which in the future could be tuned to particular corpora or users.