Lexicon reduction using dots for off-line Farsi/Arabic handwritten word recognition

  • Authors:
  • Saeed Mozaffari;Karim Faez;Volker Märgner;Haikal El-Abed

  • Affiliations:
  • Pattern Recognition and Image Processing Laboratory, Electrical Engineering Department, Amirkabir University of Technology, Tehran 15914, Iran;Pattern Recognition and Image Processing Laboratory, Electrical Engineering Department, Amirkabir University of Technology, Tehran 15914, Iran;Institute of Communications Technology (IFN), Technical University of Braunschweig, Braunschweig 38092, Germany;Institute of Communications Technology (IFN), Technical University of Braunschweig, Braunschweig 38092, Germany

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2008

Quantified Score

Hi-index 0.10

Visualization

Abstract

Unlike many other languages, 18 out of 32 Farsi characters have dots appearing in groups of one, two or three. Some of these letters share common primary shapes, differing only in the number of dots and whether the dots are above or below the primary shape. In this paper, a new concept of using dots in a cursively handwritten Farsi/Arabic word is introduced for lexicon reduction and a fast method for extracting dots is presented. The technique involves extraction and representation of number and position of dots from off-line handwritten words to eliminate unlikely candidates. Experimental results on a set of 12,000 handwritten word images yield a lexicon reduction of 93% with accuracy of 85%. The proposed lexicon reduction algorithm achieves the speedup factor of 2 as well as 13% improvement in recognition rate.