Fast approximate string matching
Software—Practice & Experience
Techniques for automatically correcting words in text
ACM Computing Surveys (CSUR)
Finding approximate matches in large lexicons
Software—Practice & Experience
The String-to-String Correction Problem
Journal of the ACM (JACM)
Fast Approximate Search in Large Dictionaries
Computational Linguistics
Orthographic Errors in Web Pages: Toward Cleaner Web Corpora
Computational Linguistics
Fast Selection of Small and Precise Candidate Sets from Dictionaries for Text Correction Tasks
ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 01
Deriving Symbol Dependent Edit Weights for Text Correction_The Use of Error Dictionaries
ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02
Successfully detecting and correcting false friends using channel profiles
Proceedings of the second workshop on Analytics for noisy unstructured text data
Hi-index | 0.00 |
Lexical text correction systems are typically based on a central step: when finding a malformed token in the input text, a set of correction candidates for the token is retrieved from the given background dictionary. In previous work we introduced a method for the selection of correction candidates which is fast and leads to small candidate sets with high recall. As a prerequisite, ground truth data were used to find a set of important substitutions, merges and splits that represent characteristic errors found in the text. This prior knowledge was then used to fine-tune the meaningful selection of correction candidates. Here we show that an appropriate set of possible substitutions, merges and splits for the input text can be retrieved without any ground truth data. In the new approach, we compute an error profile of the erroneous input text in a fully automated way, using so-called error dictionaries. From this profile, suitable sets of substitutions, merges and splits are derived. Error profiling with error dictionaries is simple and very fast. As an overall result we obtain an adaptive form of candidate selection which is very efficient, does not need ground truth data and leads to small candidate sets with high recall.