Spelling correction for the telecommunications network for the deaf
Communications of the ACM
Techniques for automatically correcting words in text
ACM Computing Surveys (CSUR)
Natural Language Engineering
Proceedings of the workshop on Student research
Cooperative error handling and shallow processing
EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Bayesian grammar induction for language modeling
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
An empirical study of smoothing techniques for language modeling
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Lattice-based word identification in CLARE
ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
Context-based spelling correction for Japanese OCR
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Correcting real-word spelling errors by restoring lexical cohesion
Natural Language Engineering
Pronunciation modeling for improved spelling correction
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Phoneme-to-text transcription system with an infinite vocabulary
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Exploring distributional similarity based models for query spelling correction
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Learning a spelling error model from search query logs
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
A phrase-based statistical model for SMS text normalization
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Corrupted queries in Spanish text retrieval: error correction vs. N-Grams
Proceedings of the 2nd ACM workshop on Improving non english web searching
Concrete assignments for teaching NLP in an M.S. program
TeachNLP '05 Proceedings of the Second ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics
Teaching NLP to computer science majors via applications and experiments
TeachCL '08 Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics
Effective spelling correction in web queries and run-time DB construction
Proceedings of the 2009 International Conference on Hybrid Information Technology
Using the web for language independent spellchecking and autocorrection
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Learning phrase-based spelling error models from clickthrough data
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Hashing-based approaches to spelling correction of personal names
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
A large scale ranker-based system for search query spelling correction
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Managing misspelled queries in IR applications
Information Processing and Management: an International Journal
A graph approach to spelling correction in domain-centric search
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Why press backspace?: understanding user input behaviors in Chinese Pinyin input method
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
CMU Haitian Creole-English translation system for WMT 2011
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
The latent words language model
Computer Speech and Language
CHIME: an efficient error-tolerant Cinese pinyin input method
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Measuring contextual fitness using error contexts extracted from the Wikipedia revision history
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Autonomous self-assessment of autocorrections: exploring text message dialogues
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Automatic grading of scientific inquiry
Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
On using context for automatic correction of non-word misspellings in student essays
Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
A unified approach to transliteration-based text input with online spelling correction
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
A discriminative model for query spelling correction with latent structural SVM
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Fast multi-task learning for query spelling correction
Proceedings of the 21st ACM international conference on Information and knowledge management
Mirroring the real world in social media: twitter, geolocation, and sentiment analysis
Proceedings of the 2013 international workshop on Mining unstructured big data using natural language processing
Hi-index | 0.02 |
This paper describes a new program, correct, which takes words rejected by the Unix® spell program, proposes a list of candidate corrections, and sorts them by probability. The probability scores are the novel contribution of this work. Probabilities are based on a noisy channel model. It is assumed that the typist knows what words he or she wants to type but some noise is added on the way to the keyboard (in the form of typos and spelling errors). Using a classic Bayesian argument of the kind that is popular in the speech recognition literature (Jelinek, 1985), one can often recover the intended correction, c, from a typo, t, by finding the correction c that maximizes Pr(c) Pr(t/c). The first factor, Pr(c), is a prior model of word probabilities; the second factor, Pr(t/c), is a model of the noisy channel that accounts for spelling transformations on letter sequences (e.g., insertions, delections, substitutions and reversals). Both sets of probabilities were trained on data collected from the Associated Press (AP) newswire. This text is ideally suited for this purpose since it contains a large number of typos (about two thousand per month).