A survey of handwritten document pre-processing techniques and customizing for Indic script

Authors:
V. Hole;L. Ragha
Affiliations:
Smt. Indira Gandhi College of Engineering, Koparkhairane, Navi Mumbai, India;Ramrao Adik Institute of Technology, Navi Mumbai, India
Venue:
Proceedings of the International Conference & Workshop on Emerging Trends in Technology
Year:
2011

Citing 10
Cited 0

An improved document skew angle estimation technique

Pattern Recognition Letters
A Rotation Invariant Rule-Based Thinning Algorithm for Character Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
An Algorithm for Extracting Cursive Text Lines

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
A Comparison of Binarization Methods for Historical Archive Documents

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Offline Handwritten Devanagari Word Recognition: A Holistic Approach Based on Directional Chain Code Feature and HMM

ICIT '08 Proceedings of the 2008 International Conference on Information Technology
A Segmentation Based Approach to Offline Handwritten Devanagari Word Recognition

ICIT '08 Proceedings of the 2008 International Conference on Information Technology
Text line and word segmentation of handwritten documents

Pattern Recognition
Handwritten Text Line Segmentation by Shredding Text into its Lines

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Handwritten Text Line Identification in Indian Scripts

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
A method of detecting the orientation of aligned components

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Preprocessing of document image is a very important step to handle the deformations namely noise, different handwriting complexities that may result in base line skew, word skew, character skew, accents may be cited either above or below the text line and parts of neighboring text lines may be connected, etc. The paper proposes a novel preprocessing technique for handwritten document to handle some of the deformations usually present in the document like touching components, overlapping components, skewed lines, words with individual skews etc. and build a proper text image with all these deformations removed. Based on the analysis of Indian script character shapes and literature survey, it proposes a new sequence of preprocessing methods. A binarized image is sub-sampled and connected components are extracted. These components are dilated and thinned and is given to Hough transform for both global skew and local skew detection for line extraction. The word segmentation is done with the computation of the distances of adjacent components in the text line image and classification of the previously computed distances as either inter-word gaps or inter-character gaps. The extracted words can be used for producing properly aligned text image or for text conversion using OCR.