An Efficient Parallel Algorithm for High Dimensional Similarity Join

Authors:
Affiliations:
Venue:
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Year:
1998

Citing 4
Cited 0

High-Dimensional Similarity Joins

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Parallel Algorithms for High-dimensional Similarity Joins for Data Mining Applications

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
A One-Pass Algorithm for Accurately Estimating Quantiles for Disk-Resident Data

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multidimensional similarity join finds pairs of multi-dimensional points that are within some small distance of each other: The 驴-k-d-B tree has been proposed as a data structure that scales better as the number of dimensions increases compared to previous data structures. We present a cost model of the 驴-k-d-B tree and use it to optimize the leaf size.We present novel parallel algorithms for the similarity join using the 驴-k-d-B tree. A load-balancing strategy based on equi-depth histograms is shown to work well for uniform or low-skew situations, whereas another based on weighted equi-depth histograms works far better for high-skew datasets. The latter strategy is only slightly slower than the former strategy for low skew datasets. Further, its cost is proportional to the overall cost of the similarity join.