Towards Improving the Accuracy of Telugu OCR Systems

  • Authors:
  • P. Pavan Kumar;Chakravarthy Bhagvati;Atul Negi;Arun Agarwal;B. L. Deekshatulu

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • ICDAR '11 Proceedings of the 2011 International Conference on Document Analysis and Recognition
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Design of a high accuracy OCR system is a challenging task as the system performance is affected by its component modules. Each module has its own impact on the overall accuracy of the OCR system. An improvement in a module reflects upon overall system performance. In the present work, we have developed an OCR system for Telugu. Our experiments on a corpus of about 1000 images has shown that the system performance is degraded due to broken characters caused by the binarization module as well as due to improper character segmentation. Therefore, we address the issues of handling broken characters and poor segmentation. A novel approach which is based on feedback from the distance measure used by the classifier is proposed to handle broken characters. For character segmentation, our proposed approach exploits the orthographic properties of Telugu script. As a result, significant improvement is obtained in the performance of the system. These algorithms are generic and may be applicable to other Indian scripts, especially to south Indian scripts. In our experiments, an end-to-end system performance is evaluated which is not reported in the literature.