The R*-tree: an efficient and robust access method for points and rectangles
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Linear clustering of objects with multiple attributes
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Efficient processing of spatial joins using R-trees
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
High performance clustering based on the similarity join
Proceedings of the ninth international conference on Information and knowledge management
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
PROBE Spatial Data Modeling and Query Processing in an Image Database Application
IEEE Transactions on Software Engineering
Parallel Processing of Spatial Joins Using R-trees
ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
A Cost Model and Index Architecture for the Similarity Join
Proceedings of the 17th International Conference on Data Engineering
Spatial Joins Using R-trees: Breadth-First Traversal with Global Optimizations
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
The R+-Tree: A Dynamic Index for Multi-Dimensional Objects
VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
A performance comparison of distance-based query algorithms using R-trees in spatial databases
Information Sciences: an International Journal
Automatic threshold estimation for data matching applications
Information Sciences: an International Journal
Hi-index | 0.00 |
Similarity join, a basic operation for multi-media databases, amounts to combinations of all pairs of points, with the distance between each pair bounded by a given parameter ε In this paper, properties of index-based join algorithms are studied and a highly efficient and near-optimal similarity join algorithm is proposed. Our algorithm utilizes the Breadth-First strategy, and guides the join computation and I/O access through the cache content. In contrast with many other proposed join algorithms, our algorithm is advantageous due to the essential independence of the ordering strategies and the minimal cache capacity requirement. As a result, a more precise plan for the sequence of join computations and I/O access can be realized. Generally, processing and accessing each page can be done with only one attempt. Qualitative and quantitative analysis of the performance of the algorithm is provided. Although only R-tree (a common index structure) based similarity join is discussed in this paper, the idea can be generalized to implement other join algorithms without substantial difficulties. Experiments based on our analysis indicate that the new algorithm yields superior performances across a wide range of dimensions and sizes of databases.