Gorder: an efficient method for KNN join processing

Authors:
Chenyi Xia;Hongjun Lu;Beng Chin Ooi;Jing Hu
Affiliations:
Department of Computer Science, National University of Singapore;Department of Computer Science, Hong Kong University of Science and Technology;Department of Computer Science, National University of Singapore;Department of Computer Science, National University of Singapore
Venue:
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Year:
2004

Citing 19
Cited 22

Efficient processing of spatial joins using R-trees

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Nearest neighbor queries

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Spatial hash-joins

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Partition based spatial-merge join

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Incremental distance join algorithms for spatial databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Distance browsing in spatial databases

ACM Transactions on Database Systems (TODS)
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A cost model for query processing in high dimensional data spaces

ACM Transactions on Database Systems (TODS)
Epsilon grid order: an algorithm for the similarity join on massive high-dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
GESS: a scalable similarity-join algorithm for mining large data sets in high dimensional spaces

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Database Management Systems

Database Management Systems
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
High Dimensional Similarity Joins: Algorithms and Performance Evaluation

IEEE Transactions on Knowledge and Data Engineering
A Cost Model and Index Architecture for the Similarity Join

Proceedings of the 17th International Conference on Data Engineering
Spatial Joins Using R-trees: Breadth-First Traversal with Global Optimizations

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Indexing the Distance: An Efficient Method to KNN Processing

Proceedings of the 27th International Conference on Very Large Data Bases
The k-Nearest Neighbour Join: Turbo Charging the KDD Process

Knowledge and Information Systems

SEA-CNN: Scalable Processing of Continuous K-Nearest Neighbor Queries in Spatio-temporal Databases

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
BORDER: Efficient Computation of Boundary Points

IEEE Transactions on Knowledge and Data Engineering
A fast all nearest neighbor algorithm for applications involving large point-clouds

Computers and Graphics
Ring-constrained join: deriving fair middleman locations from pointsets via a geometric constraint

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Metric space similarity joins

ACM Transactions on Database Systems (TODS)
Improved Classification for Problem Involving Overlapping Patterns

IEICE - Transactions on Information and Systems
Design and evaluation of trajectory join algorithms

Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
High-dimensional kNN joins with incremental updates

Geoinformatica
Optimizing all-nearest-neighbor queries with trigonometric pruning

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
An incremental Hausdorff distance calculation algorithm

Proceedings of the VLDB Endowment
Similarity search on a large collection of point sets

Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
DeLi-Clu: boosting robustness, completeness, usability, and efficiency of hierarchical clustering by a closest pair ranking

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Closest pair queries with spatial constraints

PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
Finding data broadness via generalized nearest neighbors

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Scalable continuous query processing and moving object indexing in spatio-temporal databases

EDBT'06 Proceedings of the 2006 international conference on Current Trends in Database Technology
Kd-trees and the real disclosure risks of large statistical databases

Information Fusion
Efficient parallel kNN joins for large data in MapReduce

Proceedings of the 15th International Conference on Extending Database Technology
Efficient processing of k nearest neighbor joins using MapReduce

Proceedings of the VLDB Endowment
Spatial queries with two kNN predicates

Proceedings of the VLDB Endowment
Parallel k-most similar neighbor classifier for mixed data

IDEAL'12 Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning
A fast k-neighborhood algorithm for large point-clouds

SPBG'06 Proceedings of the 3rd Eurographics / IEEE VGTC conference on Point-Based Graphics
Nearest group queries

Proceedings of the 25th International Conference on Scientific and Statistical Database Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

An important but very expensive primitive operation of high-dimensional databases is the K-Nearest Neighbor (KNN) similarity join. The operation combines each point of one dataset with its KNNs in the other dataset and it provides more meaningful query results than the range similarity join. Such an operation is useful for data mining and similarity search. In this paper, we propose a novel KNN-join algorithm, called the Gorder (or the G-ordering KNN) join method. Gorder is a block nested loop join method that exploits sorting, join scheduling and distance computation filtering and reduction to reduce both I/O and CPU costs. It sorts input datasets into the G-order and applied the scheduled block nested loop join on the G-ordered data. The distance computation reduction is employed to further reduce CPU cost. It is simple and yet efficient, and handles high-dimensional data efficiently. Extensive experiments on both synthetic cluster and real life datasets were conducted, and the results illustrate that Gorder is an efficient KNN-join method and outperforms existing methods by a wide margin.