W-TSV: Weighted topological signature vector for lexicon reduction in handwritten Arabic documents

  • Authors:
  • Youssouf Chherawala;Mohamed Cheriet

  • Affiliations:
  • Synchromedia Laboratory, ícole de Technologie Supérieure, 1100 Notre-Dame Ouest, Montreal, QC, Canada;Synchromedia Laboratory, ícole de Technologie Supérieure, 1100 Notre-Dame Ouest, Montreal, QC, Canada

  • Venue:
  • Pattern Recognition
  • Year:
  • 2012

Quantified Score

Hi-index 0.01

Visualization

Abstract

This paper proposes a holistic lexicon-reduction method for ancient and modern handwritten Arabic documents. The word shape is represented by the weighted topological signature vector (W-TSV), which encodes graph data into a low-dimensional vector space. Three directed acyclic graph (DAG) representations are proposed for Arabic word shapes, based on topological and geometrical features. Lexicon reduction is achieved by a nearest neighbors search in the W-TSV space. The proposed framework has been tested on the IFN/ENIT and the Ibn Sina databases, achieving respectively a degree of reduction of 83.5% and 92.9% for an accuracy of reduction of 90%.