Spelling checkers,spelling correctors and the misspellings of poor spellers
Information Processing and Management: an International Journal
Context based spelling correction
Information Processing and Management: an International Journal
Techniques for automatically correcting words in text
ACM Computing Surveys (CSUR)
The effects of noisy data on text retrieval
Journal of the American Society for Information Science
Effects of OCR errors on ranking and feedback using the vector space model
Information Processing and Management: an International Journal
A Winnow-Based Approach to Context-Sensitive Spelling Correction
Machine Learning - Special issue on natural language learning
The TREC-5 Confusion Track: Comparing Retrieval Methods for Scanned Text
Information Retrieval
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Correcting real-word spelling errors by restoring lexical cohesion
Natural Language Engineering
Performance evaluation for text processing of noisy inputs
Proceedings of the 2005 ACM symposium on Applied computing
Deriving Symbol Dependent Edit Weights for Text Correction_The Use of Error Dictionaries
ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02
AI'07 Proceedings of the 20th Australian joint conference on Advances in artificial intelligence
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Studying the effects of noisy text on text mining applications
Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
Hi-index | 0.00 |
The detection and correction of false friends - also called real-word errors - is a notoriously difficult problem. On realistic data, the break-even point for automatic correction so far could not be reached: the number of additional infelicitous corrections outnumbered the useful corrections. We present a new approach where we first compute a profile of the error channel for the given text. During the correction process, the profile helps to restrict attention to a small set of "suspicious" lexical tokens of the input text where it is "plausible" to assume that the token represents a false friend. Using a conventional word trigram statistics for disambiguation we obtain a correction method that can be successfully applied to unrestricted text. In experiments for OCR documents, we show significant accuracy gains by fully automatic correction of false friends.