An improved fast edit approach for two-string approximated mean computation applied to OCR

  • Authors:
  • J. Abreu;J. R. Rico-Juan

  • Affiliations:
  • Dpto Informática, Universidad de Matanzas, Carretera a Varadero Km. 3 1/2, Matanzas, Cuba;Dpto Lenguajes y Sistemas Informáticos, Universidad de Alicante, San Vicente del Raspeig, Alicante, Spain

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2013

Quantified Score

Hi-index 0.10

Visualization

Abstract

This paper presents a new fast algorithm for computing an approximation to the mean of two strings of characters representing a 2D shape and its application to a new Wilson-based editing procedure. The approximate mean is built up by including some symbols from the two original strings. In addition, a Greedy approach to this algorithm is studied, which allows us to reduce the time required to compute an approximate mean. The new dataset editing scheme relaxes the criterion for deleting instances proposed by the Wilson editing procedure. In practice, not all instances misclassified by their near neighbors are pruned. Instead, an artificial instance is added to the dataset in the hope of successfully classifying the instance in the future. The new artificial instance is the approximated mean of the misclassified sample and its same-class nearest neighbor. Experiments carried out over three widely known databases of contours show that the proposed algorithm performs very well when computing the mean of two strings, and outperforms methods proposed by other authors. In particular, the low computational time required by the heuristic approach makes it very suitable when dealing with long length strings. Results also show that the proposed preprocessing scheme can reduce the classification error in about 83% of trials. There is empirical evidence that using the Greedy approximation to compute the approximated mean does not affect the performance of the editing procedure.