A pictorial dictionary for printed Farsi subwords

  • Authors:
  • Afshin Ebrahimi;Ehsanollah Kabir

  • Affiliations:
  • Department of Electrical Engineering, Sahand University of Technology (SUT), Tabriz, Iran;Department of Electrical Engineering, Tarbiat Modarres University, Tehran, Iran

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2008

Quantified Score

Hi-index 0.10

Visualization

Abstract

In this paper, we report on the use of characteristic loci features to cluster printed Farsi subwords, based on their holistic shapes. This yields a pictorial dictionary that can be used in a word recognition system to eliminate the search space. The feature vectors are compressed using PCA. The k-means algorithm is used to cluster 113,340 subwords of 4 fonts and 3 sizes to 300 clusters. The minimum and maximum numbers of cluster members are 59 and 876, respectively. The mean of each cluster is used as its entry in the pictorial dictionary. To evaluate the clustering results, a minimum mean-distance classifier was used to test a set of 5000 subwords. 78.71, 99.01 and 100 percent of these subwords were in the first, first five and first 10 closest clusters, respectively.