Fringe Map Based Text Line Segmentation of Printed Telugu Document Images

  • Authors:
  • Vijaya Kumar Koppula;Atul Negi

  • Affiliations:
  • -;-

  • Venue:
  • ICDAR '11 Proceedings of the 2011 International Conference on Document Analysis and Recognition
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text line segmentation is a crucial and important step which can greatly influence the accuracy of an OCR system. One of the major obstacles to building high-accuracy OCR systems for Indic scripts has been the text line segmentation problem. In particular for Telugu script this problem is still to be adequately addressed by research. The common methods of Roman script are not applicable due to the inherent script complexity of Telugu. Previous approaches to Telugu OCR in the literature take a simplified view of the problem, leading to errors in line segmentation. The problem is compounded in old documents that are typeset manually and have non-uniform print quality. In this work we propose a new method using the fringe map concept. In a fringe map each pixel of the binary image is associated with a fringe number that denotes the distance to the nearest black pixel. We use fringe value information to segment text lines. First we locate peak fringe numbers (PFNs). PFNs that are not between lines are filtered out. PFNs between adjacent lines are used to construct a region. The segmenting path between the adjacent lines is found by joining the filtered PFNs of a region.