Multidimensional spectral hashing

Authors:
Yair Weiss;Rob Fergus;Antonio Torralba
Affiliations:
School of Computer Science, Hebrew University, Israel;Dept. of Computer Science, Courant Institute, New York University;CSAIL, MIT
Venue:
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
Year:
2012

Citing 5
Cited 1

Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Towards a theoretical foundation for Laplacian-based manifold methods

Journal of Computer and System Sciences
Complementary hashing for approximate nearest neighbor search

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision

Order preserving hashing for approximate nearest neighbor search

Proceedings of the 21st ACM international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the growing availability of very large image databases, there has been a surge of interest in methods based on "semantic hashing", i.e. compact binary codes of data-points so that the Hamming distance between codewords correlates with similarity. In reviewing and comparing existing methods, we show that their relative performance can change drastically depending on the definition of ground-truth neighbors. Motivated by this finding, we propose a new formulation for learning binary codes which seeks to reconstruct the affinity between datapoints, rather than their distances. We show that this criterion is intractable to solve exactly, but a spectral relaxation gives an algorithm where the bits correspond to thresholded eigenvectors of the affinity matrix, and as the number of datapoints goes to infinity these eigenvectors converge to eigenfunctions of Laplace-Beltrami operators, similar to the recently proposed Spectral Hashing (SH) method. Unlike SH whose performance may degrade as the number of bits increases, the optimal code using our formulation is guaranteed to faithfully reproduce the affinities as the number of bits increases. We show that the number of eigenfunctions needed may increase exponentially with dimension, but introduce a "kernel trick" to allow us to compute with an exponentially large number of bits but using only memory and computation that grows linearly with dimension. Experiments shows that MDSH outperforms the state-of-the art, especially in the challenging regime of small distance thresholds.