Determination of the Script and Language Content of Document Images
IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic Script Identification From Document Images Using Cluster-Based Templates
IEEE Transactions on Pattern Analysis and Machine Intelligence
Rotation Invariant Texture Features and Their Use in Automatic Script Identification
IEEE Transactions on Pattern Analysis and Machine Intelligence
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
Page Layout Analyser for Multilingual Indian Documents
LEC '02 Proceedings of the Language Engineering Conference (LEC'02)
Trainable Script Identification Strategies for Indian Languages
ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Script Line Separation from Indian Multi-Script Documents
ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Language identification for printed text independent of segmentation
ICIP '95 Proceedings of the 1995 International Conference on Image Processing (Vol. 3)-Volume 3 - Volume 3
Gabor Filter Based Multi-class Classifier for Scanned Document Images
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Automatic Feature Selection with Applications to Script Identification of Degraded Documents
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Gabor Filter Based Block Energy Analysis for Text Extraction from Digital Document Images
DIAL '04 Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL'04)
Identifying Script onWord-Level with Informational Confidenc
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Text localization and extraction from complex color images
ISVC'05 Proceedings of the First international conference on Advances in Visual Computing
Script identification from indian documents
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
HVS inspired system for script identification in indian multi-script documents
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
IEEE Transactions on Image Processing
Script based text identification: a multi-level architecture
Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data
Recognition of Kannada characters extracted from scene images
Proceeding of the workshop on Document Analysis and Recognition
Proceeding of the workshop on Document Analysis and Recognition
A data acquisition and analysis system for palm leaf documents in Telugu
Proceeding of the workshop on Document Analysis and Recognition
A bilingual Gurmukhi-English OCR based on multiple script identifiers and language models
Proceedings of the 4th International Workshop on Multilingual OCR
HMM-based script identification for OCR
Proceedings of the 4th International Workshop on Multilingual OCR
Recognition of Bangla compound characters using structural decomposition
Pattern Recognition
Hi-index | 0.10 |
We report an algorithm to identify the script of each word in a document image. We start with a bi-script scenario which is later extended to tri-script and then to eleven-script scenarios. A database of 20,000 words of different font styles and sizes has been collected and used for each script. Effectiveness of Gabor and discrete cosine transform (DCT) features has been independently evaluated using nearest neighbor, linear discriminant and support vector machines (SVM) classifiers. The combination of Gabor features with nearest neighbor or SVM classifier shows promising results; i.e., over 98% for bi-script and tri-script cases and above 89% for the eleven-script scenario.