An empirical study on selective partitioning dimensions for partition-based similarity joins

Authors:
Hyoseop Shin
Affiliations:
Department of Internet and Multimedia Engineering, Konkuk University, Seoul, Republic of Korea
Venue:
Data & Knowledge Engineering
Year:
2007

Citing 15
Cited 0

The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Spatial hash-joins

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Partition based spatial-merge join

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Incremental distance join algorithms for spatial databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Adaptive multi-stage distance join processing

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Epsilon grid order: an algorithm for the similarity join on massive high-dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
The TV-tree: an index structure for high-dimensional data

The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
High-Dimensional Similarity Joins

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
High Dimensional Similarity Joins: Algorithms and Performance Evaluation

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Spatial Joins Using R-trees: Breadth-First Traversal with Global Optimizations

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Partition-Based Similarity Join in High Dimensional Data Spaces

DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
Similarity Join for Low-and High-Dimensional Data

DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
Partition-Based similarity joins using diagonal dimensions in high dimensional data spaces

IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Real-world application data are usually distributed sparsely and non-uniformly in the high dimensional space that is huge in size. Hence, selection of effective partitioning dimensions is crucial for partition-based similarity joins. In this paper, we present two data partitioning algorithms for evaluations. PerDimSelect selects some dimension axes from the original perpendicular dimension axes pool, and maps each data point into the reduced dimension space. DiaDimSelect creates one-dimensional axis by combining some of original perpendicular dimensions, and maps each data point into the newly-created dimension. In the experiments, several measures are used to compare the performances of the algorithms including CPU cost, total response time, number of created buckets. In conclusion, DiaDimSelect shows better performance than PerDimSelect, for it creates much less partition buckets with the increasing number of partitioning dimensions, which leads to keep the IO cost less expensive while decreasing CPU cost considerably.