Stroke-model-based character extraction from gray-level document images

Authors:
Xiangyun Ye;M. Cheriet;C. Y. Suen
Affiliations:
Centre for Pattern Recognition & Machine Intelligence, Concordia Univ., Montreal, Que.;-;-
Venue:
IEEE Transactions on Image Processing
Year:
2001

Citing 0
Cited 11

Streaming maximum-minimum filter using no more than three comparisons per element

Nordic Journal of Computing
A multi-plane approach for text segmentation of complex document images

Pattern Recognition
Text extraction from images captured via mobile and digital devices

International Journal of Computational Vision and Robotics
RSLDI: Restoration of single-sided low-quality document images

Pattern Recognition
A binarization method with learning-built rules for document images produced by cameras

Pattern Recognition
An improved edge-based text region segmentation algorithm applied to slab image data from steel plant

CGIM '08 Proceedings of the Tenth IASTED International Conference on Computer Graphics and Imaging
A multi-scale framework for adaptive binarization of degraded document images

Pattern Recognition
Shape based local thresholding for binarization of document images

Pattern Recognition Letters
Character region identification from cover images using DTT

ICADL'04 Proceedings of the 7th international Conference on Digital Libraries: international collaboration and cross-fertilization
AdOtsu: An adaptive and parameterless generalization of Otsu's method for document image binarization

Pattern Recognition
An algorithm for colour-based natural scene text segmentation

CBDAR'11 Proceedings of the 4th international conference on Camera-Based Document Analysis and Recognition

Quantified Score

Hi-index	0.01

Visualization

Abstract

Global gray-level thresholding techniques such as Otsu's method, and local gray-level thresholding techniques such as edge-based segmentation or the adaptive thresholding method are powerful in extracting character objects from simple or slowly varying backgrounds. However, they are found to be insufficient when the backgrounds include sharply varying contours or fonts in different sizes. A stroke-model is proposed to depict the local features of character objects as double-edges in a predefined size. This model enables us to detect thin connected components selectively, while ignoring relatively large backgrounds that appear complex. Meanwhile, since the stroke width restriction is fully factored in, the proposed technique can be used to extract characters in predefined font sizes. To process large volumes of documents efficiently, a hybrid method is proposed for character extraction from various backgrounds. Using the measurement of class separability to differentiate images with simple backgrounds from those with complex backgrounds, the hybrid method can process documents with different backgrounds by applying the appropriate methods. Experiments on extracting handwriting from a check image, as well as machine-printed characters from scene images demonstrate the effectiveness of the proposed model