Offline handwritten arabic character segmentation with probabilistic model

  • Authors:
  • Pingping Xiu;Liangrui Peng;Xiaoqing Ding;Hua Wang

  • Affiliations:
  • Dept. of Electronic Engineering, Tsinghua University, State Key Laboratory of Intelligent Technology and Systems, Beijing, China;Dept. of Electronic Engineering, Tsinghua University, State Key Laboratory of Intelligent Technology and Systems, Beijing, China;Dept. of Electronic Engineering, Tsinghua University, State Key Laboratory of Intelligent Technology and Systems, Beijing, China;Dept. of Electronic Engineering, Tsinghua University, State Key Laboratory of Intelligent Technology and Systems, Beijing, China

  • Venue:
  • DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The research on offline handwritten Arabic character recognition has received more and more attention in recent years, because of the increasing needs of Arabic document digitization. The variation in Arabic handwriting brings great difficulty in character segmentation and recognition, eg., the sub-parts (diacritics) of the Arabic character may shift away from the main part. In this paper, a new probabilistic segmentation model is proposed. First, a contour-based over-segmentation method is conducted, cutting the word image into graphemes. The graphemes are sorted into 3 queues, which are character main parts, sub-parts (diacritics) above or below main parts respectively. The confidence for each character is calculated by the probabilistic model, taking into account both of the recognizer output and the geometric confidence besides with logical constraint. Then, the global optimization is conducted to find optimal cutting path, taking weighted average of character confidences as objective function. Experiments on handwritten Arabic documents with various writing styles show the proposed method is effective.