Thick boundaries in binary space and their influence on nearest-neighbor search

Authors:
Tomasz Trzcinski;Vincent Lepetit;Pascal Fua
Affiliations:
Ecole Polytechnique Fédérale de Lausanne (EPFL), Computer Vision Laboratory, CH-1015 Lausanne, Switzerland;Ecole Polytechnique Fédérale de Lausanne (EPFL), Computer Vision Laboratory, CH-1015 Lausanne, Switzerland;Ecole Polytechnique Fédérale de Lausanne (EPFL), Computer Vision Laboratory, CH-1015 Lausanne, Switzerland
Venue:
Pattern Recognition Letters
Year:
2012

Citing 20
Cited 0

Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
An optimal algorithm for approximate nearest neighbor searching fixed dimensions

Journal of the ACM (JACM)
Data structures and algorithms for nearest neighbor search in general metric spaces

SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
An Algorithm for Finding Best Matches in Logarithmic Expected Time

ACM Transactions on Mathematical Software (TOMS)
Multiple view geometry in computer visiond

Multiple view geometry in computer visiond
Similarity estimation techniques from rounding algorithms

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Shape Indexing Using Approximate Nearest-Neighbour Search in High-Dimensional Spaces

CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Audio Fingerprinting: Nearest Neighbor Search in High Dimensional Binary Spaces

Journal of VLSI Signal Processing Systems
Scalable Recognition with a Vocabulary Tree

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Learning task-specific similarity

Learning task-specific similarity
A Branch and Bound Algorithm for Computing k-Nearest Neighbors

IEEE Transactions on Computers
Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Semantic hashing

International Journal of Approximate Reasoning
BRIEF: binary robust independent elementary features

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Discriminative Learning of Local Image Descriptors

IEEE Transactions on Pattern Analysis and Machine Intelligence
LDAHash: Improved Matching with Smaller Descriptors

IEEE Transactions on Pattern Analysis and Machine Intelligence
SURF: speeded up robust features

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I
Iterative quantization: A procrustean approach to learning binary codes

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition

Quantified Score

Hi-index	0.10

Visualization

Abstract

Binary descriptors allow faster similarity computation than real-valued ones while requiring much less storage. As a result, many algorithms have recently been proposed to binarize floating-point descriptors so that they can be searched for quickly. Unfortunately, even if the similarity between vectors can be computed fast, exhaustive linear search remains impractical for truly large databases and approximate nearest neighbor (ANN) search is still required. It is therefore surprising that relatively little attention has been paid to the efficiency of ANN algorithms on binary vectors and this is the focus of this paper. We first show that binary-space Voronoi diagrams have thick boundaries, meaning that there are many points that lie at the same distance from two random points. This violates the implicit assumption made by most ANN algorithms that points can be neatly assigned to clusters centered around a set of cluster centers. As a result, state-of-the-art algorithms that can operate on binary vectors exhibit much lower performance than those that work with floating point ones. The above analysis is the first contribution of the paper. The second one is two effective ways to overcome this limitation, by appropriately randomizing either a tree-based algorithm or hashing-based one. In both cases, we show that we obtain precision/recall curves that are similar to those than can be obtained using floating point number calculation, but at much reduced computational cost.