Training paradigms for correcting errors in grammar and usage

Authors:
Alla Rozovskaya;Dan Roth
Affiliations:
University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL
Venue:
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Year:
2010

Citing 15
Cited 16

Automated postediting of documents

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
A Winnow-Based Approach to Context-Sensitive Spelling Correction

Machine Learning - Special issue on natural language learning
Large Margin Classification Using the Perceptron Algorithm

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Scaling Up Context-Sensitive Text Correction

Proceedings of the Thirteenth Conference on Innovative Applications of Artificial Intelligence Conference
Automatic error detection in the Japanese learners' English spoken data

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
Memory-based learning for article generation

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Detecting errors in English article usage by non-native speakers

Natural Language Engineering
Correcting ESL errors using phrasal SMT techniques

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Modeling Discriminative Global Inference

ICSC '07 Proceedings of the International Conference on Semantic Computing
The importance of syntactic parsing and inference in semantic role labeling

Computational Linguistics
A classifier-based approach to preposition and determiner error correction in L2 English

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
GenERRate: generating errors for use in grammatical error detection

EdAppsNLP '09 Proceedings of the Fourth Workshop on Innovative Use of NLP for Building Educational Applications
Native judgments of non-native usage: experiments in preposition error detection

HumanJudge '08 Proceedings of the Workshop on Human Judgements in Computational Linguistics
Language modeling for determiner selection

NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Annotating ESL errors: challenges and rewards

IUNLPBEA '10 Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications

Annotating ESL errors: challenges and rewards

IUNLPBEA '10 Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications
Generating confusion sets for context-sensitive error correction

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Generating learner-like morphological errors in Russian

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Grammatical error correction with alternating structure optimization

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Algorithm selection and model adaptation for ESL correction tasks

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Creating a manually error-tagged and shallow-parsed learner corpus

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
High-order sequence modeling for language learner error detection

IUNLPBEA '11 Proceedings of the 6th Workshop on Innovative Use of NLP for Building Educational Applications
Exploiting syntactic and distributional information for spelling correction with web-scale n-gram models

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
University of Illinois system in HOO text correction shared task

ENLG '11 Proceedings of the 13th European Workshop on Natural Language Generation
NUS at the HOO 2012 shared task

Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
HOO 2012 error recognition and correction shared task: Cambridge University submission report

Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
Korea University system in the HOO 2012 shared task

Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
The UI system in the HOO 2012 shared task on error correction

Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
A meta learning approach to grammatical error correction

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Grammar error correction using pseudo-error sentences and domain adaptation

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Bucking the trend: improved evaluation and annotation practices for ESL error detection systems

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a novel approach to the problem of training classifiers to detect and correct grammar and usage errors in text by selectively introducing mistakes into the training data. When training a classifier, we would like the distribution of examples seen in training to be as similar as possible to the one seen in testing. In error correction problems, such as correcting mistakes made by second language learners, a system is generally trained on correct data, since annotating data for training is expensive. Error generation methods avoid expensive data annotation and create training data that resemble non-native data with errors. We apply error generation methods and train classifiers for detecting and correcting article errors in essays written by non-native English speakers; we show that training on data that contain errors produces higher accuracy when compared to a system that is trained on clean native data. We propose several training paradigms with error generation and show that each such paradigm is superior to training a classifier on native data. We also show that the most successful error generation methods are those that use knowledge about the article distribution and error patterns observed in non-native text.