SWAM: a family of access methods for similarity-search in peer-to-peer data networks

  • Authors:
  • Farnoush Banaei-Kashani;Cyrus Shahabi

  • Affiliations:
  • University of Southern California, Los Angeles, CA;University of Southern California, Los Angeles, CA

  • Venue:
  • Proceedings of the thirteenth ACM international conference on Information and knowledge management
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Peer-to-peer Data Networks (PDNs) are large-scale, self-organizing, distributed query processing systems. Familiar examples of PDN are peer-to-peer file-sharing networks, which support exact-match search queries to locate user-requested files. In this paper, we formalize the more general problem of similarity-search in PDNs, and propose a family of distributed access methods, termed Small-World Access Methods (SWAM), for efficient execution of various similarity-search queries, namely exact-match, range, and k-nearest-neighbor queries. Unlike its predecessors, i.e., LH* and DHTs, SWAM does not control the assignment of data objects to PDN nodes; each node autonomously stores its own data. Besides, SWAM supports all similarity-search queries on multiple attributes. SWAM guarantees that the query object will be found (if it exists in the network) in average time logarithmically proportional to the network size. Moreover, once the query object is found, all the similar objects would be in its proximate network neighborhood and hence enabling efficient range and k-nearest-neighbor queries. As a specific instance of SWAM, we propose SWAM-V, a Voronoi-based SWAM that indexes PDNs with multi-attribute data objects. For a PDN with N nodes SWAM-V has query time, communication cost, and computation cost of O(log N) for exact-match queries, and O(log N + sN) and O(log N + k) for range queries (with selectivity s) and kNN queries, respectively. Our experiments show that SWAM-V consistently outperforms a similarity-search enabled version of CAN in query time and communication cost by a factor of 2 to 3.