A Computational Approach to Edge Detection
IEEE Transactions on Pattern Analysis and Machine Intelligence
Self-organization and associative memory: 3rd edition
Self-organization and associative memory: 3rd edition
Determination of the Script and Language Content of Document Images
IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic Script Identification From Document Images Using Cluster-Based Templates
IEEE Transactions on Pattern Analysis and Machine Intelligence
Rotation Invariant Texture Features and Their Use in Automatic Script Identification
IEEE Transactions on Pattern Analysis and Machine Intelligence
On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence
Normalized Cuts and Image Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Optical Character Recognition: An Illustrated Guide to the Frontier
Optical Character Recognition: An Illustrated Guide to the Frontier
Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns
IEEE Transactions on Pattern Analysis and Machine Intelligence
Classification of Oriental and European Scripts by Using Characteristic Features
ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Multiclass Spectral Clustering
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Automatic Writer Identification Using Fragmented Connected-Component Contours
IWFHR '04 Proceedings of the Ninth International Workshop on Frontiers in Handwriting Recognition
Histograms of Oriented Gradients for Human Detection
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Texture for Script Identification
IEEE Transactions on Pattern Analysis and Machine Intelligence
Identifying Script onWord-Level with Informational Confidenc
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
A Texture-Based Method for Modeling the Background and Detecting Moving Objects
IEEE Transactions on Pattern Analysis and Machine Intelligence
Face Description with Local Binary Patterns: Application to Face Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Extracting relevant named entities for automated expense reimbursement
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Google Book Search: Document Understanding on a Massive Scale
ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02
Script and Language Identification in Noisy and Degraded Document Images
IEEE Transactions on Pattern Analysis and Machine Intelligence
Groups of Adjacent Contour Segments for Object Detection
IEEE Transactions on Pattern Analysis and Machine Intelligence
Script-Independent Text Line Segmentation in Freestyle Handwritten Documents
IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning Visual Shape Lexicon for Document Image Content Recognition
ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part II
Hi-index | 0.01 |
Language identification for handwritten document images is an open document analysis problem. In this paper, we propose a novel approach to language identification for documents containing mixture of handwritten and machine printed text using image descriptors constructed from a codebook of shape features. We encode local text structures using scale and rotation invariant codewords, each representing a segmentation-free shape feature that is generic enough to be detected repeatably. We learn a concise, structurally indexed shape codebook from training by clustering and partitioning similar feature types through graph cuts. Our approach is easily extensible and does not require skew correction, scale normalization, or segmentation. We quantitatively evaluate our approach using a large real-world document image collection, which is composed of 1512 documents in eight languages (Arabic, Chinese, English, Hindi, Japanese, Korean, Russian, and Thai) and contains a complex mixture of handwritten and machine printed content. Experiments demonstrate the robustness and flexibility of our approach, and show exceptional language identification performance that exceeds the state of the art.