An OCR System to Read Two Indian Language Scripts: Bangla and Devnagari (Hindi)

  • Authors:
  • B. B. Chaudhuri;U. Pal

  • Affiliations:
  • -;-

  • Venue:
  • ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

An OCR system is proposed that can read two Indian language scripts: Bangla and Devnagari (Hindi), the most popular ones in Indian subcontinent. These scripts, having the same origin in ancient Brahmi script, have many features in common and hence a single system can be modeled to recognize them. In the proposed model, document digitization, skew detection, text line segmentation and zone separation, word and character segmentation, character grouping into basic, modifier and compound character category are done for both scripts by the same set of algorithms. The feature sets and classification tree as well as knowledge base required for error correction (such as lexicon) differ for Bangla and Devnagari. The system shows a good performance for single font scripts printed on clear document.