A Survey of Methods and Strategies in Character Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence
A Two-Stage Classifier for Broken and Blurred Digits in Forms
ICPR '98 Proceedings of the 14th International Conference on Pattern Recognition-Volume 2 - Volume 2
Hi-index | 0.01 |
This work addresses the segmentation of numeric fields in forms presenting blurring, breaks and touching in digits. In an OCR system, the segmentation phase plays a determinant role in the global accuracy of the system. Segmentation is basically addressed form two approaches: (a) as an isolated phase in the OCR process, and (b) as interacting with the recognition of the segmented item. In this work, we have considered the first one in order to develop a robust new cost function combining vertical projection, Tsujimoto metric and background information. Unlike other techniques reported in the literature, ours obtains a near-optimum number of break points in fields containing broken, blurred and touching characters, leading to high accuracy in the global OCR system. Our experiments with a sample including about 11,283 numeric fields in 144 forms (more than 50,000 digits of that kind) show that 99.74% of fields have been correctly segmented. The new cost function only made 50 errors.