SVM Based Scheme for Thai and English Script Identification
ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 01
A Bilingual Machine-Interface OCR for Printed Kannada and English Text Employing Wavelet Features
ICIT '07 Proceedings of the 10th International Conference on Information Technology
Word level multi-script identification
Pattern Recognition Letters
Optical character recognition of Gurmukhi script using multiple classifiers
Proceedings of the International Workshop on Multilingual OCR
IEEE Transactions on Pattern Analysis and Machine Intelligence
Script identification from indian documents
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Proceeding of the workshop on Document Analysis and Recognition
Hi-index | 0.00 |
English words are frequently encountered in Gurmukhi texts. A monolingual Gurmukhi OCR will recognize such words as garbage. It becomes necessary to add bilingual capability to the Gurmukhi OCR to recognize English text too. But adding bilingual capability reduces the recognition accuracy for monolingual texts due to errors in script identification. Even a system with 99% script identification accuracy results in reduction of 1% recognition accuracy on monolingual text. In this paper, we present a bilingual OCR, which recognizes both English and Gurmukhi scripts without any significant reduction in recognition accuracy as compared to the monolingual Gurmukhi OCR when recognizing monolingual Gurmukhi text. This is achieved by using multiple script identification engines and language models for both English and Gurmukhi scripts. For the first time, such a system has been developed, which recognizes with high accuracy document images containing mixed Gurmukhi and English text or only Gurmukhi/English text.