Multi-Script Line identification from Indian Documents

Authors:
U. Pal;S. Sinha;B. B. Chaudhuri
Affiliations:
-;-;-
Venue:
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Year:
2003

Citing 7
Cited 10

Determination of the Script and Language Content of Document Images

IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic Script Identification From Document Images Using Cluster-Based Templates

IEEE Transactions on Pattern Analysis and Machine Intelligence
Skew Angle Detection of Digitized Indian Script Documents

IEEE Transactions on Pattern Analysis and Machine Intelligence
Rotation Invariant Texture Features and Their Use in Automatic Script Identification

IEEE Transactions on Pattern Analysis and Machine Intelligence
Touching numeral segmentation using water reservoir concept

Pattern Recognition Letters
Automatic Separation of Words in Multi-lingual Multi-script Indian Documents

ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Script Line Separation from Indian Multi-Script Documents

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition

Word-Wise Thai and Roman Script Identification

ACM Transactions on Asian Language Information Processing (TALIP)
Curvature feature distribution based classification of Indian scripts from document images

Proceedings of the International Workshop on Multilingual OCR
Gujarati handwritten numeral optical character reorganization through neural network

Pattern Recognition
A novel framework for automatic sorting of postal documents with multi-script address blocks

Pattern Recognition
Word level identification of Kannada, Hindi and English scripts from a tri-lingual document

International Journal of Computational Vision and Robotics
Script based text identification: a multi-level architecture

Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data
Bangla/English script identification based on analysis of connected component profiles

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Script identification from indian documents

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Performance analysis of feature extractors and classifiers for script recognition of English and Gurmukhi words

Proceeding of the workshop on Document Analysis and Recognition
Bangla date field extraction in offline handwritten documents

Proceeding of the workshop on Document Analysis and Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

A document page may contain two or more different scripts.For Optical Character Recognition (OCR) of such adocument page, it is necessary to separate different scriptsbefore feeding them to their individual OCR system. In thispaper an automatic scheme is presented to identify text linesof different Indian scripts from a document. For theseparation task at first the scripts are grouped into a fewclasses according to script characteristics. Next featurebased on water reservoir principle, contour tracing, profileetc. are employed to identify them without any expensiveOCR-like algorithms. At present, the system has an overallaccuracy of about 97.52%.