Scalable algorithms for mining large databases
KDD '99 Tutorial notes of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Density-based indexing for approximate nearest-neighbor queries
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Spatial join selectivity using power laws
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Time series similarity measures (tutorial PM-2)
Tutorial notes of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
High performance clustering based on the similarity join
Proceedings of the ninth international conference on Information and knowledge management
Epsilon grid order: an algorithm for the similarity join on massive high-dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
GESS: a scalable similarity-join algorithm for mining large data sets in high dimensional spaces
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the tenth international conference on Information and knowledge management
Shape-based retrieval of similar subsequences in time-series databases
Proceedings of the 2002 ACM symposium on Applied computing
High Dimensional Similarity Joins: Algorithms and Performance Evaluation
IEEE Transactions on Knowledge and Data Engineering
A Survey of Temporal Knowledge Discovery Paradigms and Methods
IEEE Transactions on Knowledge and Data Engineering
Parallel Algorithms for High-dimensional Similarity Joins for Data Mining Applications
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Approximate Algorithms for Distance-Based Queries in High-Dimensional Data Spaces Using R-Trees
ADBIS '02 Proceedings of the 6th East European Conference on Advances in Databases and Information Systems
Optimal Dimension Order: A Generic Technique for the Similarity Join
DaWaK 2000 Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery
Partition-Based Similarity Join in High Dimensional Data Spaces
DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
On producing join results early
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
An Efficient Parallel Algorithm for High Dimensional Similarity Join
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Efficient similarity-based operations for data integration
Data & Knowledge Engineering
Information Systems - Databases: Creation, management and utilization
Integrating XML data sources using approximate joins
ACM Transactions on Database Systems (TODS)
Shape-based retrieval in time-series databases
Journal of Systems and Software
Fast similarity join for multi-dimensional data
Information Systems
Efficient index-based KNN join processing for high-dimensional data
Information and Software Technology
An empirical study on selective partitioning dimensions for partition-based similarity joins
Data & Knowledge Engineering
Progressive merge join: a generic and non-blocking sort-based join algorithm
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
ACM Transactions on Database Systems (TODS)
Using similarity-based operations for resolving data-level conflicts
BNCOD'03 Proceedings of the 20th British national conference on Databases
Optimization of joins using random record generation method
Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India
Similarity joins as stronger metric operations
SIGSPATIAL Special
Probabilistic similarity join on uncertain data
DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
VA-files vs. r*-trees in distance join queries
ADBIS'05 Proceedings of the 9th East European conference on Advances in Databases and Information Systems
Partition-Based similarity joins using diagonal dimensions in high dimensional data spaces
IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
Progressive high-dimensional similarity join
DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Super-EGO: fast multi-dimensional similarity join
The VLDB Journal — The International Journal on Very Large Data Bases
OCOG: A common grasp computation algorithm for a set of planar objects
Robotics and Computer-Integrated Manufacturing
Hi-index | 0.00 |
Many emerging data mining applications require a similarity join between points in a high-dimensional domain. We present a new algorithm that utilizes a new index structure, called the epsilon-kdB tree, for fast spatial similarity joins on high-dimensional points. This index structure reduces the number of neighboring leaf nodes that are considered for the join test, as well as the traversal cost of finding appropriate branches in the internal nodes. The storage cost for internal nodes is independent of the number of dimensions. Hence the proposed index structure scales to high-dimensional data. Empirical evaluation, using synthetic and real-life datasets, shows that similarity join using the epsilon-kdB tree is 2 to an order of magnitude faster than the R+ tree, with the performance gap increasing with the number of dimensions.