An improved document skew angle estimation technique
Pattern Recognition Letters
A Rotation Invariant Rule-Based Thinning Algorithm for Character Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
An Algorithm for Extracting Cursive Text Lines
ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
A Comparison of Binarization Methods for Historical Archive Documents
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
ICIT '08 Proceedings of the 2008 International Conference on Information Technology
A Segmentation Based Approach to Offline Handwritten Devanagari Word Recognition
ICIT '08 Proceedings of the 2008 International Conference on Information Technology
Text line and word segmentation of handwritten documents
Pattern Recognition
Handwritten Text Line Segmentation by Shredding Text into its Lines
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Handwritten Text Line Identification in Indian Scripts
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
A method of detecting the orientation of aligned components
Pattern Recognition Letters
Hi-index | 0.00 |
Preprocessing of document image is a very important step to handle the deformations namely noise, different handwriting complexities that may result in base line skew, word skew, character skew, accents may be cited either above or below the text line and parts of neighboring text lines may be connected, etc. The paper proposes a novel preprocessing technique for handwritten document to handle some of the deformations usually present in the document like touching components, overlapping components, skewed lines, words with individual skews etc. and build a proper text image with all these deformations removed. Based on the analysis of Indian script character shapes and literature survey, it proposes a new sequence of preprocessing methods. A binarized image is sub-sampled and connected components are extracted. These components are dilated and thinned and is given to Hough transform for both global skew and local skew detection for line extraction. The word segmentation is done with the computation of the distances of adjacent components in the text line image and classification of the previously computed distances as either inter-word gaps or inter-character gaps. The extracted words can be used for producing properly aligned text image or for text conversion using OCR.