Nearest Neighbor Retrieval Using Distance-Based Hashing

Authors:
Vassilis Athitsos;Michalis Potamias;Panagiotis Papapetrou;George Kollios
Affiliations:
Computer Science and Engineering Department, University of Texas at Arlington, Arlington, Texas, USA;Computer Science Department, Boston University, Boston, Massachusetts, USA;Computer Science Department, Boston University, Boston, Massachusetts, USA;Computer Science Department, Boston University, Boston, Massachusetts, USA
Venue:
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Year:
2008

Citing 0
Cited 15

Nearest neighbor search methods for handshape recognition

Proceedings of the 1st international conference on PErvasive Technologies Related to Assistive Environments
Distributed similarity search in high dimensions using locality sensitive hashing

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Quality and efficiency in high dimensional nearest neighbor search

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
HARRA: fast iterative hashed record linkage for large-scale data collections

Proceedings of the 13th International Conference on Extending Database Technology
Efficient and accurate nearest neighbor and closest pair search in high-dimensional space

ACM Transactions on Database Systems (TODS)
An efficient algorithm for reverse furthest neighbors query with metric index

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
Efficient RkNN retrieval with arbitrary non-metric similarity measures

Proceedings of the VLDB Endowment
Nearest-neighbor search algorithms on non-Euclidean manifolds for computer vision applications

Proceedings of the Seventh Indian Conference on Computer Vision, Graphics and Image Processing
Multiple kernel learning for image indexing

Proceedings of the Seventh Indian Conference on Computer Vision, Graphics and Image Processing
Distributed similarity estimation using derived dimensions

The VLDB Journal — The International Journal on Very Large Data Bases
ISIS: a new approach for efficient similarity search in sparse databases

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part II
A fast audio similarity retrieval method for millions of music tracks

Multimedia Tools and Applications
An efficient algorithm for arbitrary reverse furthest neighbor queries

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Leveraging unlabeled data to scale blocking for record linkage

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
In-network approximate computation of outliers with quality guarantees

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

A method is proposed for indexing spaces with arbitrary distance measures, so as to achieve efficient approximate nearest neighbor retrieval. Hashing methods, such as Locality Sensitive Hashing (LSH), have been successfully applied for similarity indexing in vector spaces and string spaces under the Hamming distance. The key novelty of the hashing technique proposed here is that it can be applied to spaces with arbitrary distance measures, including non-metric distance measures. First, we describe a domain-independent method for constructing a family of binary hash functions. Then, we use these functions to construct multiple multibit hash tables. We show that the LSH formalism is not applicable for analyzing the behavior of these tables as index structures. We present a novel formulation, that uses statistical observations from sample data to analyze retrieval accuracy and efficiency for the proposed indexing method. Experiments on several real-world data sets demonstrate that our method produces good trade-offs between accuracy and efficiency, and significantly outperforms VP-trees, which are a well-known method for distance-based indexing.