High-Dimensional Similarity Joins
ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Parallel Algorithms for High-dimensional Similarity Joins for Data Mining Applications
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
A One-Pass Algorithm for Accurately Estimating Quantiles for Disk-Resident Data
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Hi-index | 0.00 |
Multidimensional similarity join finds pairs of multi-dimensional points that are within some small distance of each other: The 驴-k-d-B tree has been proposed as a data structure that scales better as the number of dimensions increases compared to previous data structures. We present a cost model of the 驴-k-d-B tree and use it to optimize the leaf size.We present novel parallel algorithms for the similarity join using the 驴-k-d-B tree. A load-balancing strategy based on equi-depth histograms is shown to work well for uniform or low-skew situations, whereas another based on weighted equi-depth histograms works far better for high-skew datasets. The latter strategy is only slightly slower than the former strategy for low skew datasets. Further, its cost is proportional to the overall cost of the similarity join.