Text line segmentation for gray scale historical document images
Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
Offline arabic handwritten text recognition: A Survey
ACM Computing Surveys (CSUR)
Multilingual OCR research and applications: an overview
Proceedings of the 4th International Workshop on Multilingual OCR
Text line extraction for historical document images
Pattern Recognition Letters
Statistical script independent word spotting in offline handwritten documents
Pattern Recognition
Hi-index | 0.00 |
In this paper, we present a new text line extraction method for handwritten Arabic documents. The proposed technique is based on a generalized adaptive local connectivity map (ALCM)using a steerable directional filter. The algorithm is designed to solve the particularly complex problems seen in handwritten documents such as fluctuating, touching or crossing text lines. The proposed algorithm consists of three steps. Firstly, a steerable filter is used to probe and determine foreground intensity along multiple directions at each pixel while generating the ALCM. The ALCM is then binarized using an adaptive thresholding algorithm to get a rough estimate of the location of the text lines. In the second step, connected component analysis is used to classify text and non text patterns in the generated ALCM to refine the location of the text lines.Finally, the text lines are separated by superimposing the text line patterns in the ALCM on the original document image and extracting the connected components covered by the pattern mask. Analysis of experimental results on the DARPA MADCAT Arabic handwritten document data indicate that the method is robust and is capable of correctly isolating handwritten text lines even on challenging document images.