Language identification for printed text independent of segmentation

  • Authors:
  • S. L. Wood;Xiaozhong Yao;K. Krishnamurthi;L. Dang

  • Affiliations:
  • -;-;-;-

  • Venue:
  • ICIP '95 Proceedings of the 1995 International Conference on Image Processing (Vol. 3)-Volume 3 - Volume 3
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents efficient algorithms for determining the language classification of machine generated documents without requiring the identification of individual characters. Such algorithms may be useful for sorting and routing of facsimile documents as they arrive so that appropriate routing and secondary analysis, which may include OCR, is selected for each document. It may also prove useful as a component of a content addressable document access system. There have been numerous reported efforts which attempt to segment printed documents into homogeneous regions using Hough transforms, hidden Markov models, morphological filtering, and neural networks. However, language identification can be accomplished without explicit segmentation using the less computationally intensive methods described.