Automatic Identification of English, Chinese, Arabic, Devnagari and Bangla Script Line

  • Authors:
  • Affiliations:
  • Venue:
  • ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Abstract: In a general situation, a document page may contain several script forms. For Optical Character Recognition (OCR) of such a document page, it is necessary to separate the scripts before feeding them to their individual OCR systems. In this paper, an automatic technique for the identification of printed Roman, Chinese, Arabic, Devnagari and Bangla text lines from a single document is proposed. Shape based features, statistical features and some features obtained from the concept of water reservoir have been used for script identification. The proposed scheme has an accuracy of about 97.33%.