Towards more effective distance functions for word image matching
DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
A post-processing scheme for malayalam using statistical sub-character language models
DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Hi-index | 0.00 |
Algorithms are presented that determine the visual relationships between word images in a document. These include instances of common word images and common substrings that occur often in English language text images. This information is then used to improve the performance of a commercial optical character recognition (OCR) algorithm. The algorithms presented calculate clusters of equivalent word images as well as common initial and final substrings. Experimental results are presented that show a 40% reduction in word level error rate is achieved on a test set of documents degraded by uniform noise.