Script Line Separation from Indian Multi-Script Documents

Authors:
U. Pal;B. B. Chaudhuri
Affiliations:
-;-
Venue:
ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Year:
1999

Citing 0
Cited 13

Multi-Script Line identification from Indian Documents

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Online Handwritten Script Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Feature extraction and classification for bilingual script (Gurmukhi and Roman)

ACST'07 Proceedings of the third conference on IASTED International Conference: Advances in Computer Science and Technology
Word level multi-script identification

Pattern Recognition Letters
A novel framework for automatic sorting of postal documents with multi-script address blocks

Pattern Recognition
Local features-based script recognition from printed bilingual document images

International Journal of Computer Applications in Technology
Word level identification of Kannada, Hindi and English scripts from a tri-lingual document

International Journal of Computational Vision and Robotics
Script based text identification: a multi-level architecture

Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data
OCR of printed telugu text with high recognition accuracies

ICVGIP'06 Proceedings of the 5th Indian conference on Computer Vision, Graphics and Image Processing
Bangla/English script identification based on analysis of connected component profiles

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
HVS inspired system for script identification in indian multi-script documents

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Performance analysis of feature extractors and classifiers for script recognition of English and Gurmukhi words

Proceeding of the workshop on Document Analysis and Recognition
An empirical intrinsic mode based characterization of Indian scripts

Proceeding of the workshop on Document Analysis and Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

In a multi-lingual country like India, a document page may contain more than one script form. Under the three-language formula, the document may be printed in English, Devnagari and one of the other official Indian languages. For OCR of such a document page, it is necessary to separate these three script forms before feeding them to the OCRs of individual scripts. In this paper, an automatic technique of separating the text lines using script characteristics and shape based features is presented. At present, the system has an overall accuracy of about 98.5%.