Optimal hash functions for approximate matches on the n-cube

Authors:
Daniel M. Gordon;Victor S. Miller;Peter Ostapenko
Affiliations:
IDA Center for Commumcations Research, San Diego, CA;IDA Center for Communications Research, Princeton, NJ;IDA Center for Commumcations Research, San Diego, CA
Venue:
IEEE Transactions on Information Theory
Year:
2010

Citing 11
Cited 4

Tilings of Binary Spaces

SIAM Journal on Discrete Mathematics
The MAGMA algebra system I: the user language

Journal of Symbolic Computation - Special issue on computational algebra and number theory: proceedings of the first MAGMA conference
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Neighborhood preserving hashing and approximate queries

SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Information Theory and Reliable Communication

Information Theory and Reliable Communication
Minimum average distance subsets in the hamming cube

Discrete Mathematics
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Identifying and Filtering Near-Duplicate Documents

COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
The bit vector intersection problem

FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Bounds on distance distributions in codes of known size

IEEE Transactions on Information Theory

Optimal covering codes for finding near-collisions

SAC'10 Proceedings of the 17th international conference on Selected areas in cryptography
Memoryless near-collisions via coding theory

Designs, Codes and Cryptography
Binary Nontiles

SIAM Journal on Discrete Mathematics
Memoryless near-collisions, revisited

Information Processing Letters

Quantified Score

Hi-index	754.84

Visualization

Abstract

One way to find near-matches in large datasets is to use hash functions. In recent years locality-sensitive hash functions for various metrics have been given; for the Hamming metric projecting onto k bits is simple hash function that performs well. In this paper, we investigate alternatives to projection. For various parameters hash functions given by complete decoding algorithms for error-correcting codes work better, and asymptotically random codes perform better than projection.