SIAM Journal on Discrete Mathematics
The MAGMA algebra system I: the user language
Journal of Symbolic Computation - Special issue on computational algebra and number theory: proceedings of the first MAGMA conference
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Neighborhood preserving hashing and approximate queries
SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Information Theory and Reliable Communication
Information Theory and Reliable Communication
Minimum average distance subsets in the hamming cube
Discrete Mathematics
Similarity Search in High Dimensions via Hashing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Identifying and Filtering Near-Duplicate Documents
COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
The bit vector intersection problem
FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions
FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Bounds on distance distributions in codes of known size
IEEE Transactions on Information Theory
Optimal covering codes for finding near-collisions
SAC'10 Proceedings of the 17th international conference on Selected areas in cryptography
Memoryless near-collisions via coding theory
Designs, Codes and Cryptography
SIAM Journal on Discrete Mathematics
Memoryless near-collisions, revisited
Information Processing Letters
Hi-index | 754.84 |
One way to find near-matches in large datasets is to use hash functions. In recent years locality-sensitive hash functions for various metrics have been given; for the Hamming metric projecting onto k bits is simple hash function that performs well. In this paper, we investigate alternatives to projection. For various parameters hash functions given by complete decoding algorithms for error-correcting codes work better, and asymptotically random codes perform better than projection.