The k-Nearest Neighbour Join: Turbo Charging the KDD Process

Authors:
Christian Böhm;Florian Krebs
Affiliations:
University of Munich, Oettingenstr. 67, 80538, München, Germany;University of Munich, Oettingenstr. 67, 80538, München, Germany
Venue:
Knowledge and Information Systems
Year:
2004

Citing 0
Cited 25

SEA-CNN: Scalable Processing of Continuous K-Nearest Neighbor Queries in Spatio-temporal Databases

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
BORDER: Efficient Computation of Boundary Points

IEEE Transactions on Knowledge and Data Engineering
Efficient index-based KNN join processing for high-dimensional data

Information and Software Technology
A fast all nearest neighbor algorithm for applications involving large point-clouds

Computers and Graphics
Gorder: an efficient method for KNN join processing

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Seamlessly integrating similarity queries in SQL

Software—Practice & Experience
Periodic Pattern Analysis in Time Series Databases

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Design and evaluation of trajectory join algorithms

Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
High-dimensional kNN joins with incremental updates

Geoinformatica
Adaptive k-nearest-neighbor classification using a dynamic number of nearest neighbors

ADBIS'07 Proceedings of the 11th East European conference on Advances in databases and information systems
Optimizing all-nearest-neighbor queries with trigonometric pruning

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Instant code clone search

Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
A fast hybrid classification algorithm based on the minimum distance and the k-NN classifiers

Proceedings of the Fourth International Conference on SImilarity Search and APplications
Finding the sites with best accessibilities to amenities

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications: Part II
DeLi-Clu: boosting robustness, completeness, usability, and efficiency of hierarchical clustering by a closest pair ranking

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Closest pair queries with spatial constraints

PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
Finding data broadness via generalized nearest neighbors

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Scalable continuous query processing and moving object indexing in spatio-temporal databases

EDBT'06 Proceedings of the 2006 international conference on Current Trends in Database Technology
Efficient parallel kNN joins for large data in MapReduce

Proceedings of the 15th International Conference on Extending Database Technology
Efficient processing of k nearest neighbor joins using MapReduce

Proceedings of the VLDB Endowment
Parallel k-most similar neighbor classifier for mixed data

IDEAL'12 Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning
A fast k-neighborhood algorithm for large point-clouds

SPBG'06 Proceedings of the 3rd Eurographics / IEEE VGTC conference on Point-Based Graphics
Nearest group queries

Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Similarity queries: their conceptual evaluation, transformations, and processing

The VLDB Journal — The International Journal on Very Large Data Bases
Reverse-k-Nearest-Neighbor join processing

SSTD'13 Proceedings of the 13th international conference on Advances in Spatial and Temporal Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

The similarity join has become an important database primitive for supporting similarity searches and data mining. A similarity join combines two sets of complex objects such that the result contains all pairs of similar objects. Two types of the similarity join are well-known, the distance range join, in which the user defines a distance threshold for the join, and the closest pair query or k-distance join, which retrieves the k most similar pairs. In this paper, we propose an important, third similarity join operation called the k-nearest neighbour join, which combines each point of one point set with its k nearest neighbours in the other set. We discover that many standard algorithms of Knowledge Discovery in Databases (KDD) such as k-means and k-medoid clustering, nearest neighbour classification, data cleansing, postprocessing of sampling-based data mining, etc. can be implemented on top of the k-nn join operation to achieve performance improvements without affecting the quality of the result of these algorithms. We propose a new algorithm to compute the k-nearest neighbour join using the multipage index (MuX), a specialised index structure for the similarity join. To reduce both CPU and I/O costs, we develop optimal loading and processing strategies.