Composite Script Identification and Orientation Detection for Indian Text Images

Authors:
Shamita Ghosh;Bidyut B. Chaudhuri
Affiliations:
-;-
Venue:
ICDAR '11 Proceedings of the 2011 International Conference on Document Analysis and Recognition
Year:
2011

Citing 0
Cited 1

Word level script recognition for Uighur document mixed with English script

Proceedings of the 4th International Workshop on Multilingual OCR

Quantified Score

Hi-index	0.00

Visualization

Abstract

A major preprocessing step in a multi-script OCR is to identify the script type of the test document image. The published papers on script identification usually assume that the test image is in correct i.e. 0掳 orientation. But by mistake a document may be fed to the system in wrong orientation, say at an angle of nearly 180掳 or 卤90掳. In this method we propose a script identification method that works for unknown orientation for all 11 official Indian scripts. Here, we first find the skew and counter-rotate the document by the skew angle. This will lead to correct (0掳) or upside down (180掳) orientation. Then script identification is done by a multi-stage tree classifier using features invariant to 0掳/180掳 orientation. Next we go to find the orientation of the image by a two class classifier for each script. Performance of the proposed method has been tested on a variety of documents and promising results have been obtained.