The effects of learner errors on the development of a collocation detection tool

  • Authors:
  • Yoko Futagi

  • Affiliations:
  • Educational Testing Service, Princeton, NJ, USA

  • Venue:
  • AND '10 Proceedings of the fourth workshop on Analytics for noisy unstructured text data
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Texts produced by language learners often contain a fair amount of noise, such as misspellings, grammar errors and word-choice errors. These pose a challenge to designing an automated tool to process these texts, partly because most existing tools used in text processing preliminary to linguistic analysis, such as POS-tagging and syntactic parsing, are trained on native-speaker data from which errors have been edited out, and are not designed to deal with atypical errors produced by language learners. In designing and implementing an NLP or a computer-assisted language learning (CALL) tool, determining which "non-pertinent" errors (i.e. errors not specifically targeted by the tool) to deal with and how exactly to deal with them can have a measurable impact on the tool performance. This paper discusses how dealing with some of the "non-pertinent" learner errors in the development of an automated tool to detect miscollocations in learner texts significantly reduces potential tool errors.