An improved fast edit approach for two-string approximated mean computation applied to OCR

Authors:
J. Abreu;J. R. Rico-Juan
Affiliations:
Dpto Informática, Universidad de Matanzas, Carretera a Varadero Km. 3 1/2, Matanzas, Cuba;Dpto Lenguajes y Sistemas Informáticos, Universidad de Alicante, San Vicente del Raspeig, Alicante, Spain
Venue:
Pattern Recognition Letters
Year:
2013

Citing 18
Cited 0

The String-to-String Correction Problem

Journal of the ACM (JACM)
Topology of strings: median string is NP-complete

Theoretical Computer Science
Reduction Techniques for Instance-BasedLearning Algorithms

Machine Learning
Automatic Construction of 2D Shape Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
A mean string algorithm to compute the average among a set of 2D shapes

Pattern Recognition Letters
Median strings for k-nearest neighbour classification

Pattern Recognition Letters
Learning Shape Models from Examples Using Automatic Shape Clustering and Procrustes Analysis

IPMI '99 Proceedings of the 16th International Conference on Information Processing in Medical Imaging
Comparison of AESA and LAESA search algorithms using string and tree-edit-distances

Pattern Recognition Letters
Towards a genetic based prototyper for character shapes

ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 2) - Volume 2
A Learning Model for Multiple-Prototype Classification of Strings

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 4 - Volume 04
A Leaf Image Retrieval Scheme Based on Partial Dynamic Time Warping and Two-Level Filtering

CIT '07 Proceedings of the 7th IEEE International Conference on Computer and Information Technology
Mean Shape Models for Polyp Detection in CT Colonography

DICTA '08 Proceedings of the 2008 Digital Image Computing: Techniques and Applications
Automatic contour model creation out of polygonal CAD models for markerless Augmented Reality

ISMAR '07 Proceedings of the 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality
Nearest neighbor editing aided by unlabeled data

Information Sciences: an International Journal
A new editing scheme based on a fast two-string median computation applied to OCR

SSPR&SPR'10 Proceedings of the 2010 joint IAPR international conference on Structural, syntactic, and statistical pattern recognition
A stochastic approach to wilson's editing algorithm

IbPRIA'05 Proceedings of the Second Iberian conference on Pattern Recognition and Image Analysis - Volume Part II
Considerations about sample-size sensitivity of a family of editednearest-neighbor rules

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Reducing the effect of noise on human contour in gait recognition

ICB'07 Proceedings of the 2007 international conference on Advances in Biometrics

Quantified Score

Hi-index	0.10

Visualization

Abstract

This paper presents a new fast algorithm for computing an approximation to the mean of two strings of characters representing a 2D shape and its application to a new Wilson-based editing procedure. The approximate mean is built up by including some symbols from the two original strings. In addition, a Greedy approach to this algorithm is studied, which allows us to reduce the time required to compute an approximate mean. The new dataset editing scheme relaxes the criterion for deleting instances proposed by the Wilson editing procedure. In practice, not all instances misclassified by their near neighbors are pruned. Instead, an artificial instance is added to the dataset in the hope of successfully classifying the instance in the future. The new artificial instance is the approximated mean of the misclassified sample and its same-class nearest neighbor. Experiments carried out over three widely known databases of contours show that the proposed algorithm performs very well when computing the mean of two strings, and outperforms methods proposed by other authors. In particular, the low computational time required by the heuristic approach makes it very suitable when dealing with long length strings. Results also show that the proposed preprocessing scheme can reduce the classification error in about 83% of trials. There is empirical evidence that using the Greedy approximation to compute the approximated mean does not affect the performance of the editing procedure.