Script Identification Based on Morphological Reconstruction in Document Images

Authors:
B. V. Dhandra;P. Nagabhushan;Mallikarjun Hangarge;Ravindra Hegadi;V. S. Malemath
Affiliations:
Gulbarga University, Gulbarga-585106, Karnataka, India;Gulbarga University, Gulbarga-585106, Karnataka, India;Gulbarga University, Gulbarga-585106, Karnataka, India;Gulbarga University, Gulbarga-585106, Karnataka, India;Gulbarga University, Gulbarga-585106, Karnataka, India
Venue:
ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 02
Year:
2006

Citing 0
Cited 3

Word-Wise Thai and Roman Script Identification

ACM Transactions on Asian Language Information Processing (TALIP)
Local features-based script recognition from printed bilingual document images

International Journal of Computer Applications in Technology
Script based text identification: a multi-level architecture

Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, the study of script identification based on morphological reconstruction for printed document images is carried out. The system is developed by using 609-scanned document images representing English, Hindi, Kannada, and Urdu scripts. The system developed includes a feature extractor and a classifier. The feature extractor consists of two stages. In the first stage, the morphological erosion and opening by reconstruction is carried out on a document image in horizontal, vertical, right and left diagonal directions using the line structuring element. The length of the structuring element is fixed, based on the average height of all the connected components of an image. In the next stage, average pixel distribution is found in these resulting images. A nearest neighbor analysis is used to classify the new documents. Accuracy of classification averaged 97% across the four scripts. The method shows robustness with respect to noise, font sizes and styles.