A Winnow-Based Approach to Context-Sensitive Spelling Correction
Machine Learning - Special issue on natural language learning
Web-based models for natural language processing
ACM Transactions on Speech and Language Processing (TSLP)
A Mixed Trigrams Approach for Context Sensitive Spell Checking
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Various criteria of collocation cohesion in internet: comparison of resolving power
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Hi-index | 0.00 |
Context-sensitive spelling correction (CSSC) is a widely accepted and long studied formalization of the problem of finding and fixing contextually incorrect words. We argue that CSSC has its limitations as a model, and propose a weakened CSSC model (RWTD) to partially counter these limitations. We weaken the CSSC model by canceling its word-correction role. Thus, RWTD is focused solely on finding words that require correction. Once this is done, the actual correction process is performed by a human or a CSSC solution. We propose a preliminary solution for RWTD model that differs from related CSSC work in several ways. The solution does not rely on a set of confusion lists and detects not only a limited set of confusion typos, but almost any class of typos. The solution offers a flexible trade-off between the time a human is willing to spend on the task and the quality of the proofreading. It does not require POS tagging and may be applied seamlessly to different languages. Experiment running times prove to be acceptable for real-world applications. We report Brown corpus real-word typos that were exposed by implementing our solution. We also discuss experiments in applying the solution to other real-world test texts and demonstrate improved false positive and hit rates.