Earth mover distance over high-dimensional spaces

  • Authors:
  • Alexandr Andoni;Piotr Indyk;Robert Krauthgamer

  • Affiliations:
  • MIT;MIT;Weizmann Institute and IBM Almaden

  • Venue:
  • Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Earth Mover Distance (EMD) between two equal-size sets of points in ℝd is defined to be the minimum cost of a bipartite matching between the two pointsets. It is a natural metric for comparing sets of features, and as such, it has received significant interest in computer vision. Motivated by recent developments in that area, we address computational problems involving EMD over high-dimensional pointsets. A natural approach is to embed the EMD metric into l1, and use the algorithms designed for the latter space. However, Khot and Naor [KN06] show that any embedding of EMD over the d-dimensional Hamming cube into l1 must incur a distortion Ω(d), thus practically losing all distance information. We circumvent this roadblock by focusing on sets with cardinalities upper-bounded by a parameter s, and achieve a distortion of only O(log s · log d). Since in applications the feature sets have bounded size, the resulting distortion is much smaller than the Ω(d) lower bound. Our approach is quite general and easily extends to EMD over ℝd. We then provide a strong lower bound on the multi-round communicatic complexity of estimating EMD, which in particular strengthens the known non-embeddability result of [KN06]. Our bound exhibits a smooth tradeoff between approximation and communication, and for example implies that every algorithm that estimates EMD using constant size sketches can only achieve ω(log s) approximation.