SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Efficient processing of spatial joins using R-trees
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Partition based spatial-merge join
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Incremental distance join algorithms for spatial databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Closest pair queries in spatial databases
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Epsilon grid order: an algorithm for the similarity join on massive high-dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
GESS: a scalable similarity-join algorithm for mining large data sets in high dimensional spaces
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Fundamentals of Database Systems
Fundamentals of Database Systems
High-Dimensional Similarity Joins
ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
High Dimensional Similarity Joins: Algorithms and Performance Evaluation
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Parallel Algorithms for High-dimensional Similarity Joins for Data Mining Applications
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Similarity Join for Low-and High-Dimensional Data
DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
Domain-independent data cleaning via analysis of entity-relationship graph
ACM Transactions on Database Systems (TODS)
Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions
FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Fast similarity join for multi-dimensional data
Information Systems
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
A Fast Similarity Join Algorithm Using Graphics Processing Units
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Exploiting context analysis for combining multiple entity resolution systems
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Efficient and accurate nearest neighbor and closest pair search in high-dimensional space
ACM Transactions on Database Systems (TODS)
Efficient parallel set-similarity joins using MapReduce
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A unified approach for computing top-k pairs in multidimensional space
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Attribute and object selection queries on objects with probabilistic attributes
ACM Transactions on Database Systems (TODS)
Hi-index | 0.00 |
Efficient processing of high-dimensional similarity joins plays an important role for a wide variety of data-driven applications. In this paper, we consider $$\varepsilon $$ -join variant of the problem. Given two $$d$$ -dimensional datasets and parameter $$\varepsilon $$ , the task is to find all pairs of points, one from each dataset that are within $$\varepsilon $$ distance from each other. We propose a new $$\varepsilon $$ -join algorithm, called Super-EGO, which belongs the EGO family of join algorithms. The new algorithm gains its advantage by using novel data-driven dimensionality re-ordering technique, developing a new EGO-strategy that more aggressively avoids unnecessary computation, as well as by developing a parallel version of the algorithm. We study the newly proposed Super-EGO algorithm on large real and synthetic datasets. The empirical study demonstrates significant advantage of the proposed solution over the existing state of the art techniques.