Similarity grid for searching in metric spaces

  • Authors:
  • Michal Batko;Claudio Gennaro;Pavel Zezula

  • Affiliations:
  • Masaryk University, Brno, Czech Republic;ISTI-CNR, Pisa, Italy;Masaryk University, Brno, Czech Republic

  • Venue:
  • DELOS'04 Proceedings of the 6th Thematic conference on Peer-to-Peer, Grid, and Service-Orientation in Digital Library Architectures
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Similarity search in metric spaces represents an important paradigm for content-based retrieval of many applications. Existing centralized search structures can speed-up retrieval, but they do not scale up to large volume of data because the response time is linearly increasing with the size of the searched file. The proposed GHT* index is a scalable and distributed structure. By exploiting parallelism in a dynamic network of computers, the GHT* achieves practically constant search time for similarity range queries in data-sets of arbitrary size. The structure also scales well with respect to the growing volume of retrieved data. Moreover, a small amount of replicated routing information on each server increases logarithmically. At the same time, the potential for interquery parallelism is increasing with the growing data-sets because the relative number of servers utilized by individual queries is decreasing. All these properties are verified by experiments on a prototype system using real-life data-sets.