Texture for Script Identification
IEEE Transactions on Pattern Analysis and Machine Intelligence
Word level identification of Kannada, Hindi and English scripts from a tri-lingual document
International Journal of Computational Vision and Robotics
A statistical global feature extraction method for optical font recognition
ACIIDS'11 Proceedings of the Third international conference on Intelligent information and database systems - Volume Part I
A novel statistical feature extraction method for textual images: Optical font recognition
Expert Systems with Applications: An International Journal
Multi-font script identification using texture-based features
ICIAR'06 Proceedings of the Third international conference on Image Analysis and Recognition - Volume Part II
Bangla/English script identification based on analysis of connected component profiles
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Texture feature evaluation for segmentation of historical document images
Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing
Hi-index | 0.01 |
In this paper we present a detailed review of current script and language identification techniques. The main criticism of the existing techniques is that most of them rely on character segmentation. We go on to present a new method based on texture analysis for script identification which does not require character segmentation. A uniform text block on which texture analysis can be performed is produced from a document image via simple processing. Multiple channel (Gabor) filters and grey level co-occurrence matrices are used in independent experiments in order to extract texture features. Classification of test documents is made based on the features of training documents using the K-NN classifier. Initial results of over 95% accuracy on the classification of 105 test documents from 7 languages are very promising. The method shows robustness with respect to noise, the presence of foreign characters or numerals, and can be applied to very small amounts of text.