Automatic localization and correction of line segmentation errors

  • Authors:
  • Anand Mishra;Naveen Sankaran;Viresh Ranjan;C. V. Jawahar

  • Affiliations:
  • IIIT Hyderabad, India;IIIT Hyderabad, India;IIIT Hyderabad, India;IIIT Hyderabad, India

  • Venue:
  • Proceeding of the workshop on Document Analysis and Recognition
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text line segmentation is a basic step in any OCR system. Its failure deteriorates the performance of OCR engines. This is especially true for the Indian languages due to the nature of scripts. Many segmentation algorithms are proposed in literature. Often these algorithms fail to adapt dynamically to a given page and thus tend to yield poor segmentation for some specific regions or some specific pages. In this work we design a text line segmentation post processor which automatically localizes and corrects the segmentation errors. The proposed segmentation post processor, which works in a "learning by examples" framework, is not only independent to segmentation algorithms but also robust to the diversity of scanned pages. We show over 5% improvement in text line segmentation on a large dataset of scanned pages for multiple Indian languages.