Rotation Invariant Texture Features and Their Use in Automatic Script Identification
IEEE Transactions on Pattern Analysis and Machine Intelligence
Twenty Years of Document Image Analysis in PAMI
IEEE Transactions on Pattern Analysis and Machine Intelligence
Imaged Document Text Retrieval Without OCR
IEEE Transactions on Pattern Analysis and Machine Intelligence
A Recognition System for Devnagri and English Handwritten Numerals
ICMI '00 Proceedings of the Third International Conference on Advances in Multimodal Interfaces
Script Identification in Printed Bilingual Documents
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Gabor Filter Based Multi-class Classifier for Scanned Document Images
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Multi-Script Line identification from Indian Documents
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Online Handwritten Script Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Texture for Script Identification
IEEE Transactions on Pattern Analysis and Machine Intelligence
Identifying Script onWord-Level with Informational Confidenc
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Language Identification of Character Images Using Machine Learning Techniques
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Script Identification Using Steerable Gabor Filters
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Automatic document orientation detection and categorization through document vectorization
MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Script and Language Identification in Noisy and Degraded Document Images
IEEE Transactions on Pattern Analysis and Machine Intelligence
Word level multi-script identification
Pattern Recognition Letters
Word-Wise Thai and Roman Script Identification
ACM Transactions on Asian Language Information Processing (TALIP)
Curvature feature distribution based classification of Indian scripts from document images
Proceedings of the International Workshop on Multilingual OCR
Combined script and page orientation estimation using the Tesseract OCR engine
Proceedings of the International Workshop on Multilingual OCR
Orientation detection of major Indian scripts
Proceedings of the International Workshop on Multilingual OCR
Language identification for handwritten document images using a shape codebook
Pattern Recognition
Script and language identification in degraded and distorted document images
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Local features-based script recognition from printed bilingual document images
International Journal of Computer Applications in Technology
Word level identification of Kannada, Hindi and English scripts from a tri-lingual document
International Journal of Computational Vision and Robotics
Document image analysis: issues, comparison of methods and remaining problems
Artificial Intelligence Review
A survey of keyword spotting techniques for printed document images
Artificial Intelligence Review
Script based text identification: a multi-level architecture
Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data
Multi-font script identification using texture-based features
ICIAR'06 Proceedings of the Third international conference on Image Analysis and Recognition - Volume Part II
Language identification in degraded and distorted document images
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Bangla/English script identification based on analysis of connected component profiles
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Script identification from indian documents
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Exploratory analysis system for semi-structured engineering logs
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
HVS inspired system for script identification in indian multi-script documents
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Proceeding of the workshop on Document Analysis and Recognition
Hi-index | 0.14 |
Most document recognition work to date has been performed on English text. Because of the large overlap of the character sets found in English and major Western European languages such as French and German, some extensions of the basic English capability to those languages have taken place. However, automatic language identification prior to optical character recognition is not commonly available and adds utility to such systems.Languages and their scripts have attributes that make it possible to determine the language of a document automatically. Detection of the values of these attributes requires the recognition of particular features of the document image and, in the case of languages using Latin-based symbols, the character syntax of the underlying language.We have developed techniques for distinguishing which language is represented in an image of text. This work is restricted to a small but important subset of the world's languages. The method first classifies the script into two broad classes: Han-based and Latin-based. This classification is based on the spatial relationships of features related to the upward concavities in character structures. Language identification within the Han script class (Chinese, Japanese, Korean) is performed by analysis of the distribution of optical density in the text images. We handle 23 Latin-based languages using a technique based on character shape codes, a representation of Latin text that is inexpensive to compute.