HVS inspired system for script identification in indian multi-script documents

Authors:
Peeta Basa Pati;A. G. Ramakrishnan
Affiliations:
Department of Electrical Engineering, Indian Institute of Science, Bangalore, India;Department of Electrical Engineering, Indian Institute of Science, Bangalore, India
Venue:
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Year:
2006

Citing 10
Cited 1

The Generalized Gabor Scheme of Image Representation in Biological and Machine Vision

IEEE Transactions on Pattern Analysis and Machine Intelligence
Unsupervised texture segmentation using Gabor filters

Pattern Recognition
Determination of the Script and Language Content of Document Images

IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic Script Identification From Document Images Using Cluster-Based Templates

IEEE Transactions on Pattern Analysis and Machine Intelligence
Rotation Invariant Texture Features and Their Use in Automatic Script Identification

IEEE Transactions on Pattern Analysis and Machine Intelligence
Page Layout Analyser for Multilingual Indian Documents

LEC '02 Proceedings of the Language Engineering Conference (LEC'02)
Trainable Script Identification Strategies for Indian Languages

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Script Line Separation from Indian Multi-Script Documents

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Language identification for printed text independent of segmentation

ICIP '95 Proceedings of the 1995 International Conference on Image Processing (Vol. 3)-Volume 3 - Volume 3
Gabor Filter Based Block Energy Analysis for Text Extraction from Digital Document Images

DIAL '04 Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL'04)

Word level multi-script identification

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Identification of the script of the text, present in multi-script documents, is one of the important first steps in the design of an OCR system. Much work has been reported relating to Roman, Arabic, Chinese, Korean and Japanese scripts. Though some work has already been reported involving Indian scripts, the work is still in its nascent stage. For example, most of the work assumes that the script changes only at the level of the line, which is rarely an acceptable assumption in the Indian scenario. In this work, we report a script identification algorithm, which takes into account the fact that the script changes at the word level in most Indian bilingual or multilingual documents. Initially, we deal with the identification of the script of words, using Gabor filters, in a bi-script scenario. Later, we extend this to tri-script and then, five-script scenarios. The combination of Gabor features with nearest neighbor classifier shows promising results. Words of different font styles and sizes are used. We have shown that our identification scheme, inspired from the Human Visual System (HVS), utilizing the same feature and classifier combination, works consistently well for any of the combination of scripts experimented.