Earth mover distance over high-dimensional spaces

Authors:
Alexandr Andoni;Piotr Indyk;Robert Krauthgamer
Affiliations:
MIT;MIT;Weizmann Institute and IBM Almaden
Venue:
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Year:
2008

Citing 22
Cited 10

Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Efficient search for approximate nearest neighbor in high dimensional spaces

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Approximate nearest neighbors and sequence comparison with block operations

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
The Earth Mover's Distance as a Metric for Image Retrieval

International Journal of Computer Vision
Space lower bounds for distance approximation in the data stream model

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Similarity estimation techniques from rounding algorithms

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Approximating a Finite Metric by a Small Number of Tree Metrics

FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
On the Impossibility of Dimension Reduction in \ell _1

FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
A near-linear constant-factor approximation for euclidean bipartite matching?

SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
An information statistics approach to data stream and communication complexity

Journal of Computer and System Sciences - Special issue on FOCS 2002
Image similarity search with compact data structures

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Low distortion embeddings for edit distance

Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
Efficient Image Matching with Distributions of Local Invariant Features

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Improved lower bounds for embeddings into L1

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Entropy based nearest neighbor search in high dimensions

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Unsupervised Learning of Categories from Sets of Partially Matching Image Features

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1
The string edit distance matching problem with moves

ACM Transactions on Algorithms (TALG)
Ferret: a toolkit for content-based similarity search of feature-rich data

Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Uncertainty principles, extractors, and explicit embeddings of l2 into l1

Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
A near linear time constant factor approximation for Euclidean bichromatic matching (cost)

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
The Computational Hardness of Estimating Edit Distance [Extended Abstract]

FOCS '07 Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science

Overcoming the l1 non-embeddability barrier: algorithms for product metrics

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Approximating edit distance in near-linear time

Proceedings of the forty-first annual ACM symposium on Theory of computing
Effectiveness of optimal incremental multi-step nearest neighbor search

Expert Systems with Applications: An International Journal
Efficient and effective similarity search over probabilistic data based on earth mover's distance

Proceedings of the VLDB Endowment
The Computational Hardness of Estimating Edit Distance

SIAM Journal on Computing
Comparing distributions and shapes using the kernel distance

Proceedings of the twenty-seventh annual symposium on Computational geometry
NNS lower bounds via metric expansion for l

ICALP'12 Proceedings of the 39th international colloquium conference on Automata, Languages, and Programming - Volume Part I
Homomorphic fingerprints under misalignments: sketching edit and shift distances

Proceedings of the forty-fifth annual ACM symposium on Theory of computing
Moving heaven and earth: distances between distributions

ACM SIGACT News
Optimal Collapsing Protocol for Multiparty Pointer Jumping

Theory of Computing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Earth Mover Distance (EMD) between two equal-size sets of points in ℝd is defined to be the minimum cost of a bipartite matching between the two pointsets. It is a natural metric for comparing sets of features, and as such, it has received significant interest in computer vision. Motivated by recent developments in that area, we address computational problems involving EMD over high-dimensional pointsets. A natural approach is to embed the EMD metric into l1, and use the algorithms designed for the latter space. However, Khot and Naor [KN06] show that any embedding of EMD over the d-dimensional Hamming cube into l1 must incur a distortion Ω(d), thus practically losing all distance information. We circumvent this roadblock by focusing on sets with cardinalities upper-bounded by a parameter s, and achieve a distortion of only O(log s · log d). Since in applications the feature sets have bounded size, the resulting distortion is much smaller than the Ω(d) lower bound. Our approach is quite general and easily extends to EMD over ℝd. We then provide a strong lower bound on the multi-round communicatic complexity of estimating EMD, which in particular strengthens the known non-embeddability result of [KN06]. Our bound exhibits a smooth tradeoff between approximation and communication, and for example implies that every algorithm that estimates EMD using constant size sketches can only achieve ω(log s) approximation.