A survey of handwritten document pre-processing techniques and customizing for Indic script

  • Authors:
  • V. Hole;L. Ragha

  • Affiliations:
  • Smt. Indira Gandhi College of Engineering, Koparkhairane, Navi Mumbai, India;Ramrao Adik Institute of Technology, Navi Mumbai, India

  • Venue:
  • Proceedings of the International Conference & Workshop on Emerging Trends in Technology
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Preprocessing of document image is a very important step to handle the deformations namely noise, different handwriting complexities that may result in base line skew, word skew, character skew, accents may be cited either above or below the text line and parts of neighboring text lines may be connected, etc. The paper proposes a novel preprocessing technique for handwritten document to handle some of the deformations usually present in the document like touching components, overlapping components, skewed lines, words with individual skews etc. and build a proper text image with all these deformations removed. Based on the analysis of Indian script character shapes and literature survey, it proposes a new sequence of preprocessing methods. A binarized image is sub-sampled and connected components are extracted. These components are dilated and thinned and is given to Hough transform for both global skew and local skew detection for line extraction. The word segmentation is done with the computation of the distances of adjacent components in the text line image and classification of the previously computed distances as either inter-word gaps or inter-character gaps. The extracted words can be used for producing properly aligned text image or for text conversion using OCR.