A spelling correction program based on a noisy channel model

Authors:
Mark D. Kernighan;Kenneth W. Church;William A. Gale
Affiliations:
AT&T Bell Laboratories, Murray Hill, N.J.;AT&T Bell Laboratories, Murray Hill, N.J.;AT&T Bell Laboratories, Murray Hill, N.J.
Venue:
COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 2
Year:
1990

Citing 0
Cited 38

Spelling correction for the telecommunications network for the deaf

Communications of the ACM
Techniques for automatically correcting words in text

ACM Computing Surveys (CSUR)
Dedication to William A. Gale

Natural Language Engineering
The use of error tags in ARTFL's Encyclopédie: does good error identification lead to good error correction?

Proceedings of the workshop on Student research
Cooperative error handling and shallow processing

EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Bayesian grammar induction for language modeling

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
An empirical study of smoothing techniques for language modeling

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Lattice-based word identification in CLARE

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
Context-based spelling correction for Japanese OCR

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Correcting real-word spelling errors by restoring lexical cohesion

Natural Language Engineering
Pronunciation modeling for improved spelling correction

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Phoneme-to-text transcription system with an infinite vocabulary

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Exploring distributional similarity based models for query spelling correction

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Matching inconsistently spelled names in automatic speech recognizer output for information retrieval

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Learning a spelling error model from search query logs

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
A phrase-based statistical model for SMS text normalization

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Corrupted queries in Spanish text retrieval: error correction vs. N-Grams

Proceedings of the 2nd ACM workshop on Improving non english web searching
Concrete assignments for teaching NLP in an M.S. program

TeachNLP '05 Proceedings of the Second ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics
Teaching NLP to computer science majors via applications and experiments

TeachCL '08 Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics
Effective spelling correction in web queries and run-time DB construction

Proceedings of the 2009 International Conference on Hybrid Information Technology
Using the web for language independent spellchecking and autocorrection

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Learning phrase-based spelling error models from clickthrough data

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Hashing-based approaches to spelling correction of personal names

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
A large scale ranker-based system for search query spelling correction

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Managing misspelled queries in IR applications

Information Processing and Management: an International Journal
A graph approach to spelling correction in domain-centric search

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Why press backspace?: understanding user input behaviors in Chinese Pinyin input method

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
CMU Haitian Creole-English translation system for WMT 2011

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
The latent words language model

Computer Speech and Language
CHIME: an efficient error-tolerant Cinese pinyin input method

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Measuring contextual fitness using error contexts extracted from the Wikipedia revision history

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Autonomous self-assessment of autocorrections: exploring text message dialogues

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Automatic grading of scientific inquiry

Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
On using context for automatic correction of non-word misspellings in student essays

Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
A unified approach to transliteration-based text input with online spelling correction

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
A discriminative model for query spelling correction with latent structural SVM

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Fast multi-task learning for query spelling correction

Proceedings of the 21st ACM international conference on Information and knowledge management
Mirroring the real world in social media: twitter, geolocation, and sentiment analysis

Proceedings of the 2013 international workshop on Mining unstructured big data using natural language processing

Quantified Score

Hi-index	0.02

Visualization

Abstract

This paper describes a new program, correct, which takes words rejected by the Unix® spell program, proposes a list of candidate corrections, and sorts them by probability. The probability scores are the novel contribution of this work. Probabilities are based on a noisy channel model. It is assumed that the typist knows what words he or she wants to type but some noise is added on the way to the keyboard (in the form of typos and spelling errors). Using a classic Bayesian argument of the kind that is popular in the speech recognition literature (Jelinek, 1985), one can often recover the intended correction, c, from a typo, t, by finding the correction c that maximizes Pr(c) Pr(t/c). The first factor, Pr(c), is a prior model of word probabilities; the second factor, Pr(t/c), is a model of the noisy channel that accounts for spelling transformations on letter sequences (e.g., insertions, delections, substitutions and reversals). Both sets of probabilities were trained on data collected from the Associated Press (AP) newswire. This text is ideally suited for this purpose since it contains a large number of typos (about two thousand per month).