A study on font-family and font-size recognition applied to Arabic word images at ultra-low resolution

  • Authors:
  • Fouad Slimane;Slim Kanoun;Jean Hennebert;Adel M. Alimi;Rolf Ingold

  • Affiliations:
  • DIVA Group, Department of Informatics, University of Fribourg, Bd. de Perolles 90, 1700 Fribourg, Switzerland and REGIM Lab., National School of Engineers (ENIS), University of Sfax, BP 1173, Sfax ...;National School of Engineers (ENIS), University of Sfax, BP 1173, Sfax 3038, Tunisia;DIVA Group, Department of Informatics, University of Fribourg, Bd. de Perolles 90, 1700 Fribourg, Switzerland;REGIM Lab., National School of Engineers (ENIS), University of Sfax, BP 1173, Sfax 3038, Tunisia;DIVA Group, Department of Informatics, University of Fribourg, Bd. de Perolles 90, 1700 Fribourg, Switzerland

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2013

Quantified Score

Hi-index 0.10

Visualization

Abstract

In this paper, we propose a new font and size identification method for ultra-low resolution Arabic word images using a stochastic approach. The literature has proved the difficulty for Arabic text recognition systems to treat multi-font and multi-size word images. This is due to the variability induced by some font family, in addition to the inherent difficulties of Arabic writing including cursive representation, overlaps and ligatures. This research work proposes an efficient stochastic approach to tackle the problem of font and size recognition. Our method treats a word image with a fixed-length, overlapping sliding window. Each window is represented with a 102 features whose distribution is captured by Gaussian Mixture Models (GMMs). We present three systems: (1) a font recognition system, (2) a size recognition system and (3) a font and size recognition system. We demonstrate the importance of font identification before recognizing the word images with two multi-font Arabic OCRs (cascading and global). The cascading system is about 23% better than the global multi-font system in terms of word recognition rate on the Arabic Printed Text Image (APTI) database which is freely available to the scientific community.