Script based text identification: a multi-level architecture

Authors:
Ehtesham Hassan;Ritu Garg;Santanu Chaudhury;M. Gopal
Affiliations:
IIT Delhi;IIT Delhi;IIT Delhi;IIT Delhi
Venue:
Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data
Year:
2011

Citing 14
Cited 0

Determination of the Script and Language Content of Document Images

IEEE Transactions on Pattern Analysis and Machine Intelligence
Rotation Invariant Texture Features and Their Use in Automatic Script Identification

IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Script Identification in Printed Bilingual Documents

DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Script Line Separation from Indian Multi-Script Documents

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Multi-Script Line identification from Indian Documents

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Robust Real-Time Face Detection

International Journal of Computer Vision
Texture for Script Identification

IEEE Transactions on Pattern Analysis and Machine Intelligence
Script Identification Based on Morphological Reconstruction in Document Images

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 02
Word level multi-script identification

Pattern Recognition Letters
Curvature feature distribution based classification of Indian scripts from document images

Proceedings of the International Workshop on Multilingual OCR
Document Image Retrieval Using Feature Combination in Kernel Space

ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
Bangla/English script identification based on analysis of connected component profiles

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Adaptive, quadratic preprocessing of document images for binarization

IEEE Transactions on Image Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Script identification in a multi-lingual document environment has numerous applications in the field of document image analysis, such as indexing and retrieval or as an initial step towards optical character recognition. In this paper, we propose a novel hierarchical framework for script identification in bi-lingual documents. The framework presents a top-down approach by performing page, block/paragraph and word level script identification in multiple stages. We utilize texture and shape based information embedded in the documents at different levels for feature extraction. The prediction task at different levels of hierarchy is performed by Support Vector Machine (SVM) and Rejection based classifier defined using AdaBoost. Experimental evaluation of the proposed concept on document collections of Hindi/English and Bangla/English scripts have shown promising results.