Lexical Postcorrection of OCR-Results: The Web as a Dynamic Secondary Dictionary?
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Fast Approximate Search in Large Dictionaries
Computational Linguistics
Adaptive text correction with Web-crawled domain-dependent dictionaries
ACM Transactions on Speech and Language Processing (TSLP)
Unsupervised profiling of OCRed historical documents
Pattern Recognition
Hi-index | 0.00 |
We describe the realization of a dictionary based lexical postprocessing approach. A character hypotheses lattice (CHL) serves as input which is compared with the words of the vocabulary, using a generalization of the weighted edit distance. The search for the best word is based on a depth first traversal through the paths of the CHL and is directed by several heuristics to achieve a reasonable processing speed without deteriorating the recognition rate significantly. An iterative supervised automatic learning algorithm is proposed which determines the costs for the edit operations. Experiments reveal that this method significantly improves the recognition accuracy.