Word level script recognition for Uighur document mixed with English script

  • Authors:
  • Hao Ye;Liangrui Peng

  • Affiliations:
  • Tsinghua University, Beijing, China;Tsinghua University, Beijing, China

  • Venue:
  • Proceedings of the 4th International Workshop on Multilingual OCR
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Script recognition is one of the key technologies in Uighur OCR research, as it is common to find English words or sentences in Uighur documents, especially in scientific documents. A word level based script recognition is presented in this paper. The original Uighur text images are segmented into text lines. The text line images are then segmented into word level images. Features are extracted in sub-blocks of the word level images. Two features, edge hinge feature and Gabor feature, are introduced and compared. SVM is adopted as classifier and trained by the labeled segmented word images. The final script recognition results are given by fusing the results of sub-blocks of segmented word images. Experimental results are made on segmented word images and text line images, which prove the effectiveness of the proposed method.