Trainable Script Identification Strategies for Indian Languages

Authors:
Santanu Chaudhury;Rabindra Sheth
Affiliations:
-;-
Venue:
ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Year:
1999

Citing 0
Cited 11

Script Identification in Printed Bilingual Documents

DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Devising Interactive Access Techniques for Indian Language Document Images

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Word level multi-script identification

Pattern Recognition Letters
Curvature feature distribution based classification of Indian scripts from document images

Proceedings of the International Workshop on Multilingual OCR
Combined script and page orientation estimation using the Tesseract OCR engine

Proceedings of the International Workshop on Multilingual OCR
A novel framework for automatic sorting of postal documents with multi-script address blocks

Pattern Recognition
Local features-based script recognition from printed bilingual document images

International Journal of Computer Applications in Technology
Bangla/English script identification based on analysis of connected component profiles

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Script identification from indian documents

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
HVS inspired system for script identification in indian multi-script documents

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Performance analysis of feature extractors and classifiers for script recognition of English and Gurmukhi words

Proceeding of the workshop on Document Analysis and Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Identification of the script in an image of a document page is of primary importance for a system processing multi-lingual documents. In this paper three trainable classification schemes have been proposed for identification of Indian scripts. The first scheme is based upon a frequency domain representation of the horizontal profile of the textual blocks. The other two schemes use connected components extracted from the textual region. We have proposed a novel Gabor filter-based feature extraction scheme for the connected components. We have also found that frequency distribution of the width-to-height ratio of the connected components can also be used for script recognition. It has been experimentally found that the Gabor filter-based scheme provides the most reliable performance. However, the other two techniques are computationally more efficient.