Localization, Extraction and Recognition of Text in Telugu Document Images

  • Authors:
  • Atul Negi;K. Nikhil Shanker;Chandra Kanth Chereddi

  • Affiliations:
  • -;-;-

  • Venue:
  • ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we present a system to locate, extract andrecognize Telugu text. The circular nature of Telugu scriptis exploited for segmenting text regions using the HoughTransform. First, the Hough Transform for circles is performedon the Sobel gradient magnitude of the image tolocate text. The located circles are filled to yield text regions,followed by Recursive XY Cuts to segment the regionsinto paragraphs, lines and word regions. A regionmerging process with a bottom-up approach envelopes individualwords. Local binarization of the word MBRs yieldsconnected components containing glyphs for recognition.The recognition process first identifies candidate charactersby a zoning technique and then constructs structural featurevectors by cavity analysis. Finally, if required, crossingcount based non-linear normalization and scaling is performedbefore template matching. The segmentation processsucceeds in extracting text from images with complexNon-Manhattan layouts. The recognition process gave acharacter recognition accuracy of 97%-98%.