Detecting errors in English article usage by non-native speakers

Authors:
Na-Rae Han;Martin Chodorow;Claudia Leacock
Affiliations:
University of Pennsylvania, 619 Williams Hall, 36th & Spruce Street, Philadelphia, PA 19104, USA e-mail: nrh@ling.upenn.edu and Educational Testing Service Rosedale Rd. MS 13E, Princeton, NJ 08541 ...;Hunter College of the City University of New York 695 Park Avenue, New York, NY 10021, USA e-mail: mchodoro@hunter.cuny.edu;Pearson Knowledge Technologies, 4940 Pearl East Circle, Boulder, CO 80301, USA e-mail: cleacock@pearsonkt.com
Venue:
Natural Language Engineering
Year:
2006

Citing 9
Cited 40

Automated postediting of documents

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Maximum entropy models for natural language ambiguity resolution

Maximum entropy models for natural language ambiguity resolution
Definiteness predictions for Japanese noun phrases

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Countability and number in Japanese to English machine translation

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Classifiers in Japanese-to-English machine translation

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Using an ontology to determine English countability

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Learning the countability of English nouns from corpus data

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Automatic error detection in the Japanese learners' English spoken data

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
Memory-based learning for article generation

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7

A Method for Recognizing Noisy Romanized Japanese Words in Learner English

IEICE - Transactions on Information and Systems
Opportunities for Natural Language Processing Research in Education

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
A classifier-based approach to preposition and determiner error correction in L2 English

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
The ups and downs of preposition error detection in ESL writing

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
User input and interactions on Microsoft Research ESL Assistant

EdAppsNLP '09 Proceedings of the Fourth Workshop on Innovative Use of NLP for Building Educational Applications
GenERRate: generating errors for use in grammatical error detection

EdAppsNLP '09 Proceedings of the Fourth Workshop on Innovative Use of NLP for Building Educational Applications
Native judgments of non-native usage: experiments in preposition error detection

HumanJudge '08 Proceedings of the Workshop on Human Judgements in Computational Linguistics
Recognizing noisy romanized Japanese words in learner English

EANL '08 Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications
Automatic identification of discourse moves in scientific article introductions

EANL '08 Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications
Human evaluation of article and noun number usage: influences of context and construction variability

ACL-IJCNLP '09 Proceedings of the Third Linguistic Annotation Workshop
Annotating language errors in texts: investigating argumentation and decision schemas

ACL-IJCNLP '09 Proceedings of the Third Linguistic Annotation Workshop
Training paradigms for correcting errors in grammar and usage

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Using mostly native data to correct errors in learners' writing: a meta-classifier approach

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Annotating ESL errors: challenges and rewards

IUNLPBEA '10 Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications
Search right and thou shalt find...: using web queries for learner error detection

IUNLPBEA '10 Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications
Rethinking grammatical error annotation and evaluation with the Amazon Mechanical Turk

IUNLPBEA '10 Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications
Building a Korean web corpus for analyzing learner language

WAC-6 '10 Proceedings of the NAACL HLT 2010 Sixth Web as Corpus Workshop
Generating confusion sets for context-sensitive error correction

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Evaluating performance of grammatical error detection to maximize learning effect

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Syntax-driven machine translation as a model of ESL revision

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Grammatical error correction with alternating structure optimization

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Algorithm selection and model adaptation for ESL correction tasks

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Creating a manually error-tagged and shallow-parsed learner corpus

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Exploiting learners' tendencies for detecting english determiner errors

KES'11 Proceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems - Volume Part II
Developing methodology for Korean particle error detection

IUNLPBEA '11 Proceedings of the 6th Workshop on Innovative Use of NLP for Building Educational Applications
Error Diagnosis of Chinese Sentences Using Inductive Learning Algorithm and Decomposition-Based Testing Mechanism

ACM Transactions on Asian Language Information Processing (TALIP)
e-rating machine translation

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Exploiting syntactic and distributional information for spelling correction with web-scale n-gram models

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
NUS at the HOO 2011 pilot shared task

ENLG '11 Proceedings of the 13th European Workshop on Natural Language Generation
Stylometric analysis of scientific articles

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Google books n-gram corpus used as a grammar checker

EACL 2012 Proceedings of the Second Workshop on Computational Linguistics and Writing (CLW 2012): Linguistic and Cognitive Aspects of Document Creation and Document Engineering
NUS at the HOO 2012 shared task

Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
Precision isn't everything: a hybrid approach to grammatical error detection

Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
HOO 2012 error recognition and correction shared task: Cambridge University submission report

Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
The UI system in the HOO 2012 shared task on error correction

Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
A meta learning approach to grammatical error correction

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
A beam-search decoder for grammatical error correction

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Developing learner corpus annotation for Korean particle errors

LAW VI '12 Proceedings of the Sixth Linguistic Annotation Workshop
Evidence in automatic error correction improves learners' english skill

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Bucking the trend: improved evaluation and annotation practices for ESL error detection systems

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the most difficult challenges faced by non-native speakers of English is mastering the system of English articles. We trained a maximum entropy classifier to select among a/an, the, or zero article for noun phrases (NPs), based on a set of features extracted from the local context of each. When the classifier was trained on 6 million NPs, its performance on published text was about 83% correct. We then used the classifier to detect article errors in the TOEFL essays of native speakers of Chinese, Japanese, and Russian. These writers made such errors in about one out of every eight NPs, or almost once in every three sentences. The classifier's agreement with human annotators was 85% (kappa = 0.48) when it selected among a/an, the, or zero article. Agreement was 89% (kappa = 0.56) when it made a binary (yes/no) decision about whether the NP should have an article. Even with these levels of overall agreement, precision and recall in error detection were only 0.52 and 0.80, respectively. However, when the classifier was allowed to skip cases where its confidence was low, precision rose to 0.90, with 0.40 recall. Additional improvements in performance may require features that reflect general knowledge to handle phenomena such as indirect prior reference. In August 2005, the classifier was deployed as a component of Educational Testing Service's Criterion$^{SM}$ Online Writing Evaluation Service.