Identifying Script onWord-Level with Informational Confidenc

Authors:
Stefan Jaeger;Huanfeng Ma;David Doermann
Affiliations:
University of Maryland;University of Maryland;University of Maryland
Venue:
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Year:
2005

Citing 9
Cited 3

Determination of the Script and Language Content of Document Images

IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic Script Identification From Document Images Using Cluster-Based Templates

IEEE Transactions on Pattern Analysis and Machine Intelligence
On Combining Classifiers

IEEE Transactions on Pattern Analysis and Machine Intelligence
Font Recognition Based on Global Texture Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence - Graph Algorithms and Computer Vision
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
The Document Spectrum for Page Layout Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Language determination: natural language processing from scanned document images

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Informational Classifier Fusion

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 1 - Volume 01
Using Informational Confidence Values for Classifier Combination: An Experiment with Combined On-Line/Off-Line Japanese Character Recognition

IWFHR '04 Proceedings of the Ninth International Workshop on Frontiers in Handwriting Recognition

Word level multi-script identification

Pattern Recognition Letters
Word-Wise Thai and Roman Script Identification

ACM Transactions on Asian Language Information Processing (TALIP)
Language identification for handwritten document images using a shape codebook

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a multiple classifier system for script identification. Applying a Gabor filter analysis of textures on word-level, our system identifies Latin and non-Latin words in bilingual printed documents. The classifier system comprises four different architectures based on nearest neighbors, weighted Euclidean distances, Gaussian mixture models, and support vector machines.We report results for Arabic, Chinese, Hindi, and Korean script. Moreover, we show that combining informational confidence values using sum-rule can consistently outperform the best single recognition rate.