The effects of learner errors on the development of a collocation detection tool

Authors:
Yoko Futagi
Affiliations:
Educational Testing Service, Princeton, NJ, USA
Venue:
AND '10 Proceedings of the fourth workshop on Analytics for noisy unstructured text data
Year:
2010

Citing 7
Cited 2

Maximum entropy models for natural language ambiguity resolution

Maximum entropy models for natural language ambiguity resolution
A nonparametric method for extraction of candidate phrasal terms

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Correcting ESL errors using phrasal SMT techniques

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
The ups and downs of preposition error detection in ESL writing

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Automated suggestions for miscollocations

EdAppsNLP '09 Proceedings of the Fourth Workshop on Innovative Use of NLP for Building Educational Applications
Automatic collocation suggestion in academic writing

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Search right and thou shalt find...: using web queries for learner error detection

IUNLPBEA '10 Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications

Noise reduction in essay datasets for automated essay grading

OTM'11 Proceedings of the 2011th Confederated international conference on On the move to meaningful internet systems
On using context for automatic correction of non-word misspellings in student essays

Proceedings of the Seventh Workshop on Building Educational Applications Using NLP

Quantified Score

Hi-index	0.00

Visualization

Abstract

Texts produced by language learners often contain a fair amount of noise, such as misspellings, grammar errors and word-choice errors. These pose a challenge to designing an automated tool to process these texts, partly because most existing tools used in text processing preliminary to linguistic analysis, such as POS-tagging and syntactic parsing, are trained on native-speaker data from which errors have been edited out, and are not designed to deal with atypical errors produced by language learners. In designing and implementing an NLP or a computer-assisted language learning (CALL) tool, determining which "non-pertinent" errors (i.e. errors not specifically targeted by the tool) to deal with and how exactly to deal with them can have a measurable impact on the tool performance. This paper discusses how dealing with some of the "non-pertinent" learner errors in the development of an automated tool to detect miscollocations in learner texts significantly reduces potential tool errors.