An Efficient Parallel Algorithm for High Dimensional Similarity Join

  • Authors:
  • Affiliations:
  • Venue:
  • IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

Multidimensional similarity join finds pairs of multi-dimensional points that are within some small distance of each other: The 驴-k-d-B tree has been proposed as a data structure that scales better as the number of dimensions increases compared to previous data structures. We present a cost model of the 驴-k-d-B tree and use it to optimize the leaf size.We present novel parallel algorithms for the similarity join using the 驴-k-d-B tree. A load-balancing strategy based on equi-depth histograms is shown to work well for uniform or low-skew situations, whereas another based on weighted equi-depth histograms works far better for high-skew datasets. The latter strategy is only slightly slower than the former strategy for low skew datasets. Further, its cost is proportional to the overall cost of the similarity join.