Local features-based script recognition from printed bilingual document images

Authors:
S. Abirami;D. Manjula
Affiliations:
Department of Computer Science and Engineering, Anna University, Guindy, Chennai 600 025, India.;Department of Computer Science and Engineering, Anna University, Guindy, Chennai 600 025, India
Venue:
International Journal of Computer Applications in Technology
Year:
2010

Citing 16
Cited 0

Determination of the Script and Language Content of Document Images

IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic Script Identification From Document Images Using Cluster-Based Templates

IEEE Transactions on Pattern Analysis and Machine Intelligence
Font Recognition Based on Global Texture Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence - Graph Algorithms and Computer Vision
Automatic Separation of Words in Multi-lingual Multi-script Indian Documents

ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Trainable Script Identification Strategies for Indian Languages

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Script Line Separation from Indian Multi-Script Documents

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Automatic Identification of English, Chinese, Arabic, Devnagari and Bangla Script Line

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Gabor Filter Based Multi-class Classifier for Scanned Document Images

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Texture for Script Identification

IEEE Transactions on Pattern Analysis and Machine Intelligence
Language Identification of Character Images Using Machine Learning Techniques

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Script Identification of Document Image Analysis

ICICIC '06 Proceedings of the First International Conference on Innovative Computing, Information and Control - Volume 3
Script Identification Based on Morphological Reconstruction in Document Images

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 02
Texture Image Retrieval Using Novel Non-separable Filter Banks Based on Centrally Symmetric Matrices

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 04
Script and Language Identification in Noisy and Degraded Document Images

IEEE Transactions on Pattern Analysis and Machine Intelligence
Script and language identification in degraded and distorted document images

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Script identification from indian documents

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems

Quantified Score

Hi-index	0.02

Visualization

Abstract

Classification and identification of language in a biscript document is one of the important steps in the design of an OCR system for successful analysis and recognition. This paper presents architecture for script recognition of bilingual document images (Tamil, English), which specifically takes the challenges of recognition at character level by predicting the script of word image using its initial character, thereby adapting to various font faces and sizes. This recogniser models every character as Tetra bit values (TBV), which corresponds to the spatial spread, derived from the segmented grids of the character. We employed a decision tree classifier (DTC) for the classification of script on over the patterns generated from TBV. A spatial features-based script recogniser (SFBSR) is trained and tested with bilingual document images, consisting of various Tamil and English words, to show its effectiveness towards script identification. Classification accuracy in training and testing sets is promising. An evaluation of the system performance with various techniques shows a significant performance improvement in SFBSR. This can be embedded with OCR prior to its recognition stage.