Information retrieval
The String-to-String Correction Problem
Journal of the ACM (JACM)
Spatial sampling effects in optical character recognition
ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1
Spatial Sampling of Printed Patterns
IEEE Transactions on Pattern Analysis and Machine Intelligence
Twenty Years of Document Image Analysis in PAMI
IEEE Transactions on Pattern Analysis and Machine Intelligence
A Statistical, Nonparametric Methodology for Document Degradation Model Validation
IEEE Transactions on Pattern Analysis and Machine Intelligence
IEEE Transactions on Pattern Analysis and Machine Intelligence
Effect of OCR error correction on Arabic retrieval
Information Retrieval
Proceedings of the International Workshop on Multilingual OCR
An efficient parametrization of character degradation model for semi-synthetic image generation
Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing
Hi-index | 0.14 |
In this paper, we consider the problem of evaluating character image generators that model distortions encountered in optical character recognition (OCR). While a number of such defect models have been proposed, the contention that they produce the desired result is typically argued in an ad hoc and informal way. We introduce a rigorous and more pragmatic definition of when a model is accurate: we say a defect model is validated if the OCR errors induced by the model are indistinguishable from the errors encountered when using real scanned documents. We describe four measures to quantify this similarity, and compare and contrast them using over ten million scanned and synthesized characters in three fonts. The measures differentiate effectively between different fonts and different scans of the same font regardless of the underlying text.