Script identification from indian documents

Authors:
Gopal Datt Joshi;Saurabh Garg;Jayanthi Sivaswamy
Affiliations:
Centre for Visual Information Technology, IIIT, Hyderabad, India;Centre for Visual Information Technology, IIIT, Hyderabad, India;Centre for Visual Information Technology, IIIT, Hyderabad, India
Venue:
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Year:
2006

Citing 8
Cited 8

Determination of the Script and Language Content of Document Images

IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic Script Identification From Document Images Using Cluster-Based Templates

IEEE Transactions on Pattern Analysis and Machine Intelligence
Rotation Invariant Texture Features and Their Use in Automatic Script Identification

IEEE Transactions on Pattern Analysis and Machine Intelligence
Trainable Script Identification Strategies for Indian Languages

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Language identification for printed text independent of segmentation

ICIP '95 Proceedings of the 1995 International Conference on Image Processing (Vol. 3)-Volume 3 - Volume 3
Multi-Script Line identification from Indian Documents

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Page segmentation using texture analysis

Pattern Recognition

Gabor wavelet similarity maps for optimising hierarchical road sign classifiers

Pattern Recognition Letters
Word level multi-script identification

Pattern Recognition Letters
Curvature feature distribution based classification of Indian scripts from document images

Proceedings of the International Workshop on Multilingual OCR
A novel framework for automatic sorting of postal documents with multi-script address blocks

Pattern Recognition
Local features-based script recognition from printed bilingual document images

International Journal of Computer Applications in Technology
Word level identification of Kannada, Hindi and English scripts from a tri-lingual document

International Journal of Computational Vision and Robotics
Performance analysis of feature extractors and classifiers for script recognition of English and Gurmukhi words

Proceeding of the workshop on Document Analysis and Recognition
A bilingual Gurmukhi-English OCR based on multiple script identifiers and language models

Proceedings of the 4th International Workshop on Multilingual OCR

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic identification of a script in a given document image facilitates many important applications such as automatic archiving of multilingual documents, searching online archives of document images and for the selection of script specific OCR in a multilingual environment. In this paper, we present a scheme to identify different Indian scripts from a document image. This scheme employs hierarchical classification which uses features consistent with human perception. Such features are extracted from the responses of a multi-channel log-Gabor filter bank, designed at an optimal scale and multiple orientations. In the first stage, the classifier groups the scripts into five major classes using global features. At the next stage, a sub-classification is performed based on script-specific features. All features are extracted globally from a given text block which does not require any complex and reliable segmentation of the document image into lines and characters. Thus the proposed scheme is efficient and can be used for many practical applications which require processing large volumes of data. The scheme has been tested on 10 Indian scripts and found to be robust to skew generated in the process of scanning and relatively insensitive to change in font size. This proposed system achieves an overall classification accuracy of 97.11% on a large testing data set. These results serve to establish the utility of global approach to classification of scripts.