Some experiments on clustering a set of strings

  • Authors:
  • Jean-Michel Jolion

  • Affiliations:
  • Lyon Research Center for Images and Information Systems, INSA Lyon, Villeurbanne Cedex, France

  • Venue:
  • GbRPR'03 Proceedings of the 4th IAPR international conference on Graph based representations in pattern recognition
  • Year:
  • 2003

Quantified Score

Hi-index 0.02

Visualization

Abstract

We introduce in this paper the concept of set deviation as a tool to characterize the deviation of a set of strings around its set median. The set deviation is defined as the set median of the positive edit sequences between any string and the set median. We show how the set deviation can be efficiently used in well known statistical estimation and particularly with the minimum volume ellipsoid estimator. This concept is illustrated on several examples and particularly in clustering a set of shapes coded as strings using the Freeman code.