Language identification for printed text independent of segmentation

Authors:
S. L. Wood;Xiaozhong Yao;K. Krishnamurthi;L. Dang
Affiliations:
-;-;-;-
Venue:
ICIP '95 Proceedings of the 1995 International Conference on Image Processing (Vol. 3)-Volume 3 - Volume 3
Year:
1995

Citing 0
Cited 10

Script Identification in Printed Bilingual Documents

DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Language Identification of Character Images Using Machine Learning Techniques

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Word level multi-script identification

Pattern Recognition Letters
Curvature feature distribution based classification of Indian scripts from document images

Proceedings of the International Workshop on Multilingual OCR
Word level identification of Kannada, Hindi and English scripts from a tri-lingual document

International Journal of Computational Vision and Robotics
Contribution to the discrimination of the medieval manuscript texts: application in the palaeography

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Bangla/English script identification based on analysis of connected component profiles

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Script identification from indian documents

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
HVS inspired system for script identification in indian multi-script documents

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Performance analysis of feature extractors and classifiers for script recognition of English and Gurmukhi words

Proceeding of the workshop on Document Analysis and Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents efficient algorithms for determining the language classification of machine generated documents without requiring the identification of individual characters. Such algorithms may be useful for sorting and routing of facsimile documents as they arrive so that appropriate routing and secondary analysis, which may include OCR, is selected for each document. It may also prove useful as a component of a content addressable document access system. There have been numerous reported efforts which attempt to segment printed documents into homogeneous regions using Hough transforms, hidden Markov models, morphological filtering, and neural networks. However, language identification can be accomplished without explicit segmentation using the less computationally intensive methods described.