Script and Language Identification from Document Images

Authors:
G. S. Peake;T. N. Tan
Affiliations:
-;-
Venue:
DIA '97 Proceedings of the 1997 Workshop on Document Image Analysis
Year:
1997

Citing 0
Cited 7

Texture for Script Identification

IEEE Transactions on Pattern Analysis and Machine Intelligence
Word level identification of Kannada, Hindi and English scripts from a tri-lingual document

International Journal of Computational Vision and Robotics
A statistical global feature extraction method for optical font recognition

ACIIDS'11 Proceedings of the Third international conference on Intelligent information and database systems - Volume Part I
A novel statistical feature extraction method for textual images: Optical font recognition

Expert Systems with Applications: An International Journal
Multi-font script identification using texture-based features

ICIAR'06 Proceedings of the Third international conference on Image Analysis and Recognition - Volume Part II
Bangla/English script identification based on analysis of connected component profiles

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Texture feature evaluation for segmentation of historical document images

Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper we present a detailed review of current script and language identification techniques. The main criticism of the existing techniques is that most of them rely on character segmentation. We go on to present a new method based on texture analysis for script identification which does not require character segmentation. A uniform text block on which texture analysis can be performed is produced from a document image via simple processing. Multiple channel (Gabor) filters and grey level co-occurrence matrices are used in independent experiments in order to extract texture features. Classification of test documents is made based on the features of training documents using the K-NN classifier. Initial results of over 95% accuracy on the classification of 105 test documents from 7 languages are very promising. The method shows robustness with respect to noise, the presence of foreign characters or numerals, and can be applied to very small amounts of text.