Automatic Script Identification From Document Images Using Cluster-Based Templates

Authors:
Judith Hochberg;Patrick Kelly;Timothy Thomas;Lila Kerns
Affiliations:
-;-;-;-
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
1997

Citing 0
Cited 30

Rotation Invariant Texture Features and Their Use in Automatic Script Identification

IEEE Transactions on Pattern Analysis and Machine Intelligence
Twenty Years of Document Image Analysis in PAMI

IEEE Transactions on Pattern Analysis and Machine Intelligence
Script Identification in Printed Bilingual Documents

DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Gabor Filter Based Multi-class Classifier for Scanned Document Images

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Automatic Feature Selection with Applications to Script Identification of Degraded Documents

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Multi-Script Line identification from Indian Documents

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Automated layout recognition

Proceedings of the 1st ACM workshop on Hardcopy document processing
Texture for Script Identification

IEEE Transactions on Pattern Analysis and Machine Intelligence
Identifying Script onWord-Level with Informational Confidenc

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Script Identification Using Steerable Gabor Filters

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Automatic document orientation detection and categorization through document vectorization

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Script and Language Identification in Noisy and Degraded Document Images

IEEE Transactions on Pattern Analysis and Machine Intelligence
Word level multi-script identification

Pattern Recognition Letters
Word-Wise Thai and Roman Script Identification

ACM Transactions on Asian Language Information Processing (TALIP)
Curvature feature distribution based classification of Indian scripts from document images

Proceedings of the International Workshop on Multilingual OCR
Combined script and page orientation estimation using the Tesseract OCR engine

Proceedings of the International Workshop on Multilingual OCR
Orientation detection of major Indian scripts

Proceedings of the International Workshop on Multilingual OCR
Language identification for handwritten document images using a shape codebook

Pattern Recognition
Automatic writer identification framework for online handwritten documents using character prototypes

Pattern Recognition
Script and language identification in degraded and distorted document images

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Local features-based script recognition from printed bilingual document images

International Journal of Computer Applications in Technology
Word level identification of Kannada, Hindi and English scripts from a tri-lingual document

International Journal of Computational Vision and Robotics
Language identification in degraded and distorted document images

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Bangla/English script identification based on analysis of connected component profiles

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Script identification from indian documents

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
HVS inspired system for script identification in indian multi-script documents

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Performance analysis of feature extractors and classifiers for script recognition of English and Gurmukhi words

Proceeding of the workshop on Document Analysis and Recognition
An empirical intrinsic mode based characterization of Indian scripts

Proceeding of the workshop on Document Analysis and Recognition
Word level script recognition for Uighur document mixed with English script

Proceedings of the 4th International Workshop on Multilingual OCR
Multilingual OCR research and applications: an overview

Proceedings of the 4th International Workshop on Multilingual OCR

Quantified Score

Hi-index	0.14

Visualization

Abstract

We describe an automated script identification system for typeset document images. Templates for each script are created by clustering textual symbols from a training set. Symbols from new images are compared to the templates to find the best script. Our current system processes thirteen scripts with minimal preprocessing and high accuracy.