Efficient Transcript Mapping to Ease the Creation of Document Image Segmentation Ground Truth with Text-Image Alignment

  • Authors:
  • Nikolaos Stamatopoulos;Georgios Louloudis;Basilis Gatos

  • Affiliations:
  • -;-;-

  • Venue:
  • ICFHR '10 Proceedings of the 2010 12th International Conference on Frontiers in Handwriting Recognition
  • Year:
  • 2010

Quantified Score

Hi-index 0.01

Visualization

Abstract

One of the major issues in document image processing is the efficient creation of ground truth in order to be used for training and evaluation purposes. Since a large number of tools have to be trained and evaluated in realistic circumstances, we need to have a quick and low cost way to create the corresponding ground truth. Moreover, the specific need for having the correct text correlated with the corresponding image area in text line and word level makes the process of ground truth creation a difficult, tedious and costly task. In this paper, we introduce an efficient transcript mapping technique to ease the construction of document image segmentation ground truth that includes text-image alignment. The proposed text line transcript mapping technique is based on Hough transform that is guided by the number of the text lines. Concerning the word segmentation ground truth, a gap classification technique constrained by the number of the words is used. Experimental results prove that using the proposed technique for handwritten documents, the percentage of time saved for ground truth creation and text-image alignment is more than 90%.