Fast approximate string matching
Software—Practice & Experience
Fast text searching: allowing errors
Communications of the ACM
Efficient string matching: an aid to bibliographic search
Communications of the ACM
Finite-State Language Processing
Finite-State Language Processing
Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences
An improved error model for noisy channel spelling correction
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Fast Approximate Search in Large Dictionaries
Computational Linguistics
Retrieval in text collections with historic spelling using linguistic and spelling variants
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Fast Selection of Small and Precise Candidate Sets from Dictionaries for Text Correction Tasks
ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 01
Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
On lexical resources for digitization of historical documents
Proceedings of the 9th ACM symposium on Document engineering
Generating search term variants for text collections with historic spellings
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Recognizing garbage in OCR output on historical documents
Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data
Automatic linguistic annotation of historical language: ToTrTaLe and XIX century Slovene
LaTeCH '11 Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
Unsupervised profiling of OCRed historical documents
Pattern Recognition
Hi-index | 0.00 |
Text correction systems rely on a core mechanism where suitable correction suggestions for garbled input tokens are generated. Current systems, which are designed for documents including modern language, use some form of approximate search in a given background lexicon. Due to the large amount of spelling variation found in historical documents, special lexica for historical language can only offer restricted coverage. Hence historical language is often described in terms of a matching procedure to be applied to modern words. Given such a procedure and a base lexicon of modern words, the question arises of how to generate correction suggestions for garbled historical variants. In this paper we suggest an efficient algorithm that solves this problem. The algorithm is used for postcorrection of optical character recognition results on historical document collections.