Similarity search in metric databases through hashing

Authors:
Claudio Gennaro;Pasquale Savino;Pavel Zezula
Affiliations:
IEI-CNR, Pisa, Italy;IEI-CNR, Pisa, Italy;Masaryk Univ., Brno, Czech Republic
Venue:
MULTIMEDIA '01 Proceedings of the 2001 ACM workshops on Multimedia: multimedia information retrieval
Year:
2001

Citing 3
Cited 8

Indexing large metric spaces for similarity search queries

ACM Transactions on Database Systems (TODS)
Searching in metric spaces

ACM Computing Surveys (CSUR)
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases

D-Index: Distance Searching Index for Metric Data Sets

Multimedia Tools and Applications
Index-driven similarity search in metric spaces (Survey Article)

ACM Transactions on Database Systems (TODS)
M-Grid: similarity searching in grid

P2PIR '06 Proceedings of the international workshop on Information retrieval in peer-to-peer networks
Similarity join in metric spaces

ECIR'03 Proceedings of the 25th European conference on IR research
Bucketing coding and information theory for the statistical high-dimensional nearest-neighbor problem

IEEE Transactions on Information Theory
On the configuration of the similarity search data structure d-index for high dimensional objects

ICCSA'10 Proceedings of the 2010 international conference on Computational Science and Its Applications - Volume Part III
Similarity grid for searching in metric spaces

DELOS'04 Proceedings of the 6th Thematic conference on Peer-to-Peer, Grid, and Service-Orientation in Digital Library Architectures
Self-organising hierarchical retrieval in a case-agent system

ECCBR'06 Proceedings of the 8th European conference on Advances in Case-Based Reasoning

Quantified Score

Hi-index	0.06

Visualization

Abstract

A novel access structure for similarity search in metric databases, called Similarity Hashing (SH), is proposed. It is a multi-level hash structure, consisting of search-separable bucket sets on each level. The structure supports easy insertion and bounded search costs, because at most one bucket needs to be accessed at each level for range queries up to a pre-defined value of search radius. At the same time, the pivot-based strategy significantly reduces the number of distance computations. Contrary to tree organizations, the SH structure is suitable for distributed and parallel implementations.