A bilingual Gurmukhi-English OCR based on multiple script identifiers and language models

Authors:
Gurpreet Singh Lehal
Affiliations:
Punjabi University, Patiala, Punjab, India
Venue:
Proceedings of the 4th International Workshop on Multilingual OCR
Year:
2013

Citing 7
Cited 0

SVM Based Scheme for Thai and English Script Identification

ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 01
A Bilingual Machine-Interface OCR for Printed Kannada and English Text Employing Wavelet Features

ICIT '07 Proceedings of the 10th International Conference on Information Technology
Word level multi-script identification

Pattern Recognition Letters
Optical character recognition of Gurmukhi script using multiple classifiers

Proceedings of the International Workshop on Multilingual OCR
Script Recognition—A Review

IEEE Transactions on Pattern Analysis and Machine Intelligence
Script identification from indian documents

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Performance analysis of feature extractors and classifiers for script recognition of English and Gurmukhi words

Proceeding of the workshop on Document Analysis and Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

English words are frequently encountered in Gurmukhi texts. A monolingual Gurmukhi OCR will recognize such words as garbage. It becomes necessary to add bilingual capability to the Gurmukhi OCR to recognize English text too. But adding bilingual capability reduces the recognition accuracy for monolingual texts due to errors in script identification. Even a system with 99% script identification accuracy results in reduction of 1% recognition accuracy on monolingual text. In this paper, we present a bilingual OCR, which recognizes both English and Gurmukhi scripts without any significant reduction in recognition accuracy as compared to the monolingual Gurmukhi OCR when recognizing monolingual Gurmukhi text. This is achieved by using multiple script identification engines and language models for both English and Gurmukhi scripts. For the first time, such a system has been developed, which recognizes with high accuracy document images containing mixed Gurmukhi and English text or only Gurmukhi/English text.