Word level script recognition for Uighur document mixed with English script

Authors:
Hao Ye;Liangrui Peng
Affiliations:
Tsinghua University, Beijing, China;Tsinghua University, Beijing, China
Venue:
Proceedings of the 4th International Workshop on Multilingual OCR
Year:
2013

Citing 8
Cited 0

Texture discrimination by Gabor functions

Biological Cybernetics
Automatic Script Identification From Document Images Using Cluster-Based Templates

IEEE Transactions on Pattern Analysis and Machine Intelligence
Rotation Invariant Texture Features and Their Use in Automatic Script Identification

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Multi-lingual, multi-font and multi-size large-set character recognition using self-organizing neural network

ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1
Gabor Filter Based Multi-class Classifier for Scanned Document Images

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Writer Identification Using Edge-Based Directional Features

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Composite Script Identification and Orientation Detection for Indian Text Images

ICDAR '11 Proceedings of the 2011 International Conference on Document Analysis and Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Script recognition is one of the key technologies in Uighur OCR research, as it is common to find English words or sentences in Uighur documents, especially in scientific documents. A word level based script recognition is presented in this paper. The original Uighur text images are segmented into text lines. The text line images are then segmented into word level images. Features are extracted in sub-blocks of the word level images. Two features, edge hinge feature and Gabor feature, are introduced and compared. SVM is adopted as classifier and trained by the labeled segmented word images. The final script recognition results are given by fusing the results of sub-blocks of segmented word images. Experimental results are made on segmented word images and text line images, which prove the effectiveness of the proposed method.