Efficient processing of spatial joins using R-trees
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Partition based spatial-merge join
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Incremental distance join algorithms for spatial databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Distance browsing in spatial databases
ACM Transactions on Database Systems (TODS)
LOF: identifying density-based local outliers
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A cost model for query processing in high dimensional data spaces
ACM Transactions on Database Systems (TODS)
Epsilon grid order: an algorithm for the similarity join on massive high-dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
GESS: a scalable similarity-join algorithm for mining large data sets in high dimensional spaces
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Database Management Systems
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
High Dimensional Similarity Joins: Algorithms and Performance Evaluation
IEEE Transactions on Knowledge and Data Engineering
A Cost Model and Index Architecture for the Similarity Join
Proceedings of the 17th International Conference on Data Engineering
Spatial Joins Using R-trees: Breadth-First Traversal with Global Optimizations
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Indexing the Distance: An Efficient Method to KNN Processing
Proceedings of the 27th International Conference on Very Large Data Bases
The k-Nearest Neighbour Join: Turbo Charging the KDD Process
Knowledge and Information Systems
SEA-CNN: Scalable Processing of Continuous K-Nearest Neighbor Queries in Spatio-temporal Databases
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
BORDER: Efficient Computation of Boundary Points
IEEE Transactions on Knowledge and Data Engineering
A fast all nearest neighbor algorithm for applications involving large point-clouds
Computers and Graphics
Ring-constrained join: deriving fair middleman locations from pointsets via a geometric constraint
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
ACM Transactions on Database Systems (TODS)
Improved Classification for Problem Involving Overlapping Patterns
IEICE - Transactions on Information and Systems
Design and evaluation of trajectory join algorithms
Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
High-dimensional kNN joins with incremental updates
Geoinformatica
Optimizing all-nearest-neighbor queries with trigonometric pruning
SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
An incremental Hausdorff distance calculation algorithm
Proceedings of the VLDB Endowment
Similarity search on a large collection of point sets
Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Closest pair queries with spatial constraints
PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
Finding data broadness via generalized nearest neighbors
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Scalable continuous query processing and moving object indexing in spatio-temporal databases
EDBT'06 Proceedings of the 2006 international conference on Current Trends in Database Technology
Kd-trees and the real disclosure risks of large statistical databases
Information Fusion
Efficient parallel kNN joins for large data in MapReduce
Proceedings of the 15th International Conference on Extending Database Technology
Efficient processing of k nearest neighbor joins using MapReduce
Proceedings of the VLDB Endowment
Spatial queries with two kNN predicates
Proceedings of the VLDB Endowment
Parallel k-most similar neighbor classifier for mixed data
IDEAL'12 Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning
A fast k-neighborhood algorithm for large point-clouds
SPBG'06 Proceedings of the 3rd Eurographics / IEEE VGTC conference on Point-Based Graphics
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Hi-index | 0.00 |
An important but very expensive primitive operation of high-dimensional databases is the K-Nearest Neighbor (KNN) similarity join. The operation combines each point of one dataset with its KNNs in the other dataset and it provides more meaningful query results than the range similarity join. Such an operation is useful for data mining and similarity search. In this paper, we propose a novel KNN-join algorithm, called the Gorder (or the G-ordering KNN) join method. Gorder is a block nested loop join method that exploits sorting, join scheduling and distance computation filtering and reduction to reduce both I/O and CPU costs. It sorts input datasets into the G-order and applied the scheduled block nested loop join on the G-ordered data. The distance computation reduction is employed to further reduce CPU cost. It is simple and yet efficient, and handles high-dimensional data efficiently. Extensive experiments on both synthetic cluster and real life datasets were conducted, and the results illustrate that Gorder is an efficient KNN-join method and outperforms existing methods by a wide margin.