Metric structures on datasets: stability and classification of algorithms

  • Authors:
  • Facundo Mémoli

  • Affiliations:
  • Department of Mathematics, Stanford University, California and Department of Computer Science, The University of Adelaide, Australia

  • Venue:
  • CAIP'11 Proceedings of the 14th international conference on Computer analysis of images and patterns - Volume Part II
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Several methods in data and shape analysis can be regarded as transformations between metric spaces. Examples are hierarchical clustering methods, the higher order constructions of computational persistent topology, and several computational techniques that operate within the context of data/shape matching under invariances. Metric geometry, and in particular different variants of the Gromov-Hausdorff distance provide a point of view which is applicable in different scenarios. The underlying idea is to regard datasets as metric spaces, or metric measure spaces (a.k.a. mm-spaces, which are metric spaces enriched with probability measures), and then, crucially, at the same time regard the collection of all datasets as a metric space in itself. Variations of this point of view give rise to different taxonomies that include several methods for extracting information from datasets. Imposing metric structures on the collection of all datasets could be regarded as a "soft" construction. The classification of algorithms, or the axiomatic characterization of them, could be achieved by imposing the more "rigid" category structures on the collection of all finite metric spaces and demanding functoriality of the algorithms. In this case, one would hope to single out all the algorithms that satisfy certain natural conditions, which would clarify the landscape of available methods. We describe how using this formalism leads to an axiomatic description of many clustering algorithms, both flat and hierarchical.