Exploiting syntactic and distributional information for spelling correction with web-scale n-gram models

Authors:
Wei Xu;Joel Tetreault;Martin Chodorow;Ralph Grishman;Le Zhao
Affiliations:
New York University, NY;Educational Testing Service, Princeton, NJ;Hunter College of CUNY, New York, NY;New York University, NY;Carnegie Mellon University, Pittsburgh, PA
Venue:
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Year:
2011

Citing 22
Cited 1

Learning human-like knowledge by singular value decomposition: a progress report

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
A Winnow-Based Approach to Context-Sensitive Spelling Correction

Machine Learning - Special issue on natural language learning
Automatic Rule Acquisition for Spelling Correction

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Contextual spelling correction using latent semantic analysis

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Choosing the word most typical in context using a lexical co-occurrence network

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Automatic error detection in the Japanese learners' English spoken data

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
Web-based models for natural language processing

ACM Transactions on Speech and Language Processing (TSLP)
Augmented mixture models for lexical disambiguation

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Detecting errors in English article usage by non-native speakers

Natural Language Engineering
Memory-Based Context-Sensitive Spelling Correction at Web Scale

ICMLA '07 Proceedings of the Sixth International Conference on Machine Learning and Applications
LIBLINEAR: A Library for Large Linear Classification

The Journal of Machine Learning Research
Novel semantic features for verb sense disambiguation

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
A classifier-based approach to preposition and determiner error correction in L2 English

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
The ups and downs of preposition error detection in ESL writing

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Distributional measures of concept-distance: a task-oriented evaluation

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Detection of grammatical errors involving prepositions

SigSem '07 Proceedings of the Fourth ACL-SIGSEM Workshop on Prepositions
Web-scale N-gram models for lexical disambiguation

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Using the web for language independent spellchecking and autocorrection

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
The role of PP attachment in preposition generation

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Training paradigms for correcting errors in grammar and usage

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Creating robust supervised classifiers via web-scale N-gram data

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Using parse features for preposition selection and error detection

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers

Detection of semantic errors in Arabic texts

Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a novel way of incorporating dependency parse and word co-occurrence information into a state-of-the-art web-scale n-gram model for spelling correction. The syntactic and distributional information provides extra evidence in addition to that provided by a web-scale n-gram corpus and especially helps with data sparsity problems. Experimental results show that introducing syntactic features into n-gram based models significantly reduces errors by up to 12.4% over the current state-of-the-art. The word co-occurrence information shows potential but only improves overall accuracy slightly.