Connected components in binary images: the detection problem
Connected components in binary images: the detection problem
Evaluation of Binarization Methods for Document Images
IEEE Transactions on Pattern Analysis and Machine Intelligence
Determination of the Script and Language Content of Document Images
IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic Script Identification From Document Images Using Cluster-Based Templates
IEEE Transactions on Pattern Analysis and Machine Intelligence
Rotation Invariant Texture Features and Their Use in Automatic Script Identification
IEEE Transactions on Pattern Analysis and Machine Intelligence
Language identification of on-line documents using word shapes
ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Classification of Oriental and European Scripts by Using Characteristic Features
ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Automatic script identification from images using cluster-based templates
ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1
Techniques for Language Identification for Hybrid Arabic-English Document Images
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Texture for Script Identification
IEEE Transactions on Pattern Analysis and Machine Intelligence
Script and language identification in degraded and distorted document images
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Page segmentation using texture analysis
Pattern Recognition
Language identification in degraded and distorted document images
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Retrieval of machine-printed Latin documents through Word Shape Coding
Pattern Recognition
Retrieval of machine-printed Latin documents through Word Shape Coding
Pattern Recognition
Word-Wise Thai and Roman Script Identification
ACM Transactions on Asian Language Information Processing (TALIP)
Language identification for handwritten document images using a shape codebook
Pattern Recognition
Local features-based script recognition from printed bilingual document images
International Journal of Computer Applications in Technology
Texture feature evaluation for segmentation of historical document images
Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing
Hi-index | 0.14 |
This paper reports an identification technique that detects scripts and languages of noisy and degraded document images. In the proposed technique, scripts and languages are identified through the document vectorization, which converts each document image into a document vector that characterizes the shape and frequency of the conta ned character or word images. Document images are vectorized by using vertical component cuts and character extremum points, which are both tolerant to the variation in text fonts and styles, noise, and various types of document degradation. For each script or language under study, a script or language template is first constructed through a training process. Scripts and languages of document images are then determined according to the distances between converted document vectors and the pre-constructed script and language templates. Experimental results show that the proposed technique is accurate, easy for extension, and tolerant to noise and various types of document degradation.