The design and analysis of spatial data structures
The design and analysis of spatial data structures
The R*-tree: an efficient and robust access method for points and rectangles
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
The hB-tree: a multiattribute indexing method with good guaranteed performance
ACM Transactions on Database Systems (TODS)
A retrieval technique for similar shapes
SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Spatial joins using seeded trees
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Partition based spatial-merge join
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
The Grid File: An Adaptable, Symmetric Multikey File Structure
ACM Transactions on Database Systems (TODS)
Multidimensional binary search trees used for associative searching
Communications of the ACM
The K-D-B-tree: a search structure for large multidimensional dynamic indexes
SIGMOD '81 Proceedings of the 1981 ACM SIGMOD international conference on Management of data
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
The TV-tree: an index structure for high-dimensional data
The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
Efficient Similarity Search In Sequence Databases
FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
The R+-Tree: A Dynamic Index for Multi-Dimensional Objects
VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
ACM Transactions on Database Systems (TODS)
Mining temporal interval relational rules from temporal data
Journal of Systems and Software
Real-time segmenting time series data
APWeb'03 Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications
Closest pair queries with spatial constraints
PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
Hi-index | 0.00 |
Many emerging data mining applications require a similarity join between points in a high-dimensional domain. We present a new algorithm that utilizes a new index structure, called the $\epsilon$ tree, for fast spatial similarity joins on high-dimensional points. This index structure reduces the number of neighboring leaf nodes that are considered for the join test, as well as the traversal cost of finding appropriate branches in the internal nodes. The storage cost for internal nodes is independent of the number of dimensions. Hence, the proposed index structure scales to high-dimensional data. We analyze the cost of the join for the $\epsilon$ tree and the R-tree family, and show that the $\epsilon$ tree will perform better for high-dimensional joins. Empirical evaluation, using synthetic and real-life data sets, shows that similarity join using the $\epsilon$ tree is twice to an order of magnitude faster than the $R^+$ tree, with the performance gap increasing with the number of dimensions. We also discuss how some of the ideas of the $\epsilon$ tree can be applied to the R-tree family. These biased R-trees perform better than the corresponding traditional R-trees for high-dimensional similarity joins, but do not match the performance of the $\epsilon$ tree.