HmSearch: an efficient hamming distance query processing algorithm

Authors:
Xiaoyang Zhang;Jianbin Qin;Wei Wang;Yifang Sun;Jiaheng Lu
Affiliations:
University of New South Wales, Australia;University of New South Wales, Australia;University of New South Wales, Australia;University of New South Wales, Australia;Renmin University of China, China
Venue:
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Year:
2013

Citing 16
Cited 0

An algorithm for approximate membership checking with application to password security

Information Processing Letters
Dictionary look-up with one error

Journal of Algorithms
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Syntactic clustering of the Web

Selected papers from the sixth international conference on World Wide Web
Improved bounds for dictionary look-up with one error

Information Processing Letters
Approximate Dictionary Queries

CPM '96 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching
Locality-sensitive hashing scheme based on p-stable distributions

SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
Dictionary matching and indexing with errors and don't cares

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Efficient exact set-similarity joins

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
ChemDB: a public database of small molecules and related chemoinformatics resources

Bioinformatics
Detecting near-duplicates for web crawling

Proceedings of the 16th international conference on World Wide Web
SpotSigs: robust and efficient near duplicate detection in large web collections

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Efficient Merging and Filtering Algorithms for Approximate String Searches

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Large scale Hamming distance query processing

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Locality-sensitive hashing scheme based on dynamic collision counting

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Fast search in Hamming space with multi-index hashing

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hamming distance measures the number of dimensions where two vectors have different values. In applications such as pattern recognition, information retrieval, and databases, we often need to efficiently process Hamming distance query, which retrieves vectors in a database that have no more than k Hamming distance from a given query vector. Existing work on efficient Hamming distance query processing has some of the following limitations, such as only applicable to tiny error threshold values, unable to deal with vectors where the value domain is large, or unable to attain robust performance in the presence of data skew. In this paper, we propose HmSearch, an efficient query processing method for Hamming distance queries that addresses the above-mentioned limitations. Our method is based on improved enumeration-based signatures, enhanced filtering, and the hierarchical binary filtering-and-verification. We also design an effective dimension rearrangement method to deal with data skew. Extensive experimental results demonstrate that our methods outperform state-of-the-art methods by up to two orders of magnitude.